Stratified sampling

Terminology of mathematical statistics
zero Useful+1
synonym Type sampling (Type sampling) generally refers to stratified sampling
Stratified sampling method is also called Type sampling Law. It is a population that can be divided into different sub populations (or called layers) and randomly selected from different layers according to the specified proportion sample (Individual) approach. The advantage of this method is that, sample Is quite representative, Sampling error It is small. The disadvantage is that the sampling procedures are relatively Simple random sampling It's more complicated. Quantitative survey In stratified sampling Is an excellent Probabilistic sampling Methods are often used in surveys.
Chinese name
Stratified sampling
Foreign name
stratified random sampling
Type sampling method
sample Quite representative Sampling error Smaller
Comparison of sampling procedures Simple random sampling It's more complicated

Basic Introduction

stratified sampling
Stratified sampling method, also called Type sampling Law. Is to put Overall unit Press Attribute characteristics Divide into several types or layers, and then select randomly from the types or layers sample Company. stratified sampling It is characterized by: through classification and layering, the commonality among units of various types is increased, and it is easy to extract representative Survey sample This method is applicable to situations where the overall situation is complex, there are large differences between units, and there are many units.
The specific procedure of stratified sampling is to divide each unit of the population into two or more completely independent groups (such as men and women), and carry out from two or more groups Simple random sampling sample Independent of each other. In general, all units are grouped according to the main signs General characteristics relevant. For example, ongoing work on beer Brand awareness According to the investigation on beer, it is preliminarily judged that the knowledge of men in beer is different from that of women, so gender should be the appropriate standard for dividing levels. If not in this way stratified sampling Stratified sampling will not achieve any effect, and no amount of time, energy and materials will be spent in vain.
Stratified sampling and Simple random sampling In contrast, stratified sampling is often chosen because it has significant potential statistical effects. That is, if two samples are taken from the same population, one is stratified sample, and the other is simple random sample, then the error of stratified sample is relatively smaller. On the other hand, if the goal is to obtain a certain Sampling error Horizontal, then smaller layers sample This goal will be achieved.
Stratified sampling Classified sampling or Type sampling Divide the whole into several homogeneous layers, and then random sampling Or mechanical sampling, stratified sampling is characterized by combining scientific grouping method with sampling method, and grouping reduces each sampling layer Variability The sampling ensures that the samples taken are sufficiently representative. stratified sampling According to different sampling methods in the homogeneous layer, it can be divided into general stratified sampling and Stratified proportional sampling Generally, stratified sampling is based on sample Variability to determine the sample size In the case of large variability, more layers are sampled and less layers are sampled. In the case of unknown sample variability in advance, stratified proportion sampling is usually used. [1]

Number of samples

Each layer sample There are three methods to determine the number:
① Stratified ratio. That is, the ratio of the number of samples in each layer to the total number in that layer is equal. For example, if the sample size n=50 and the overall N=500, then n/N=0.1 is Sample proportion The sample number of each layer shall be determined according to this ratio.
② Naiman method. That is, the number of samples to be taken from each floor is directly proportional to the product of the total number of the floor and its standard deviation.
③ Non proportional distribution method. When the number of cases included in a certain level accounts for too small a proportion in the total, in order to make the characteristics of this level sample It can be adequately reflected in, and the proportion of the number of samples of this layer in the total sample can be appropriately increased artificially. But doing so will increase the complexity of reasoning.


In the investigation practice, in order to improve the stratification sample Of accuracy In fact, there is a price to pay. Generally, the reality is correct stratified sampling There are generally three steps:
First, identify the outstanding (important) Demography Characteristics and classification characteristics, which are related to the behavior studied. For example, research on a certain product Consumption rate According to common sense, men and women have different average consumption ratios. In order to take gender as a meaningful symbol of stratification, investigators must be able to provide data to prove that men and women Consumption level Significantly different. In this way, different salient features can be identified. The survey shows that in general, 6 important Salient features After that, the recognition of significant features will improve the sample Representativeness will not help much.
Second, determine the overall proportion at each level (if gender has been identified as a significant feature, what proportion of men and women in the total?). Using this ratio, the number of people to be investigated in each group (layer) of the sample can be calculated.
Finally, the investigator must extract independence from each layer Simple random sample


In general, the variables that rely on stratification are stratification variables, and the ideal stratification variable is the variable to be measured in the survey or the variable highly related to it. The principle of layering is to increase the homogeneity And interlayer heterogeneity. common Hierarchical variable Gender, age, education, occupation, etc. layered random sampling In practice sampling survey Widely used in sample size In addition, it is easy to manage and costs less, validity High.
stratified sampling It is to divide the whole population into several layers according to certain marks, and sample a certain amount from each layer sample , finally summarize and calculate the required population Estimator One of Statistical sampling Technology. stay Variable sampling The rational application of stratified sampling method in tax inspection can improve the accuracy To reduce the number of samples to be spot checked. When using the stratified sampling method, the overall needs to be reorganized, and the calculation is complex. Therefore, it is only meaningful to use the stratified sampling method when most items (amount) in the inspected population are evenly distributed, and a few items are abnormal items such as high amount or low amount.
application stratified sampling For tax inspection methods sample The sampling method is relatively independent, which can be random number table , or Systematic sampling method The research focus of stratified sampling method is: first, how to calculate the total sample size and how to allocate samples at each level; Second, how to summarize the inspection results of each level to calculate the overall Estimator
1. Determination of sample size and distribution among layers
In the stratified sampling method, the sample size is still calculated as a whole, and then it is distributed to each layer. In stratified sampling method Sample size To determine, we need to first understand the overall capacity of each layer and its standard deviation.
2. Summary of inspection results at all levels
After determining the sample size of each level, tax inspectors can Sampling organization mode And the outline of tax inspection began to implement sampling tax inspection. By selecting sample Check and calculate, can get each layer average value (or average error) and the standard deviation of the actual sample. On this basis, tax inspectors need to summarize them to form an overall Point estimation and interval estimation

Relationship with multistage sampling



Multistage sampling Different from stratified sampling , which has the advantage of being suitable for sampling survey It has a wide range of aspects, none of which includes all Overall unit Of Sampling frame , or the overall range is too large to extract directly sample And so on, which can relatively save investigation costs. The main disadvantage is that it is troublesome to sample, and the estimation of the population from the sample is more complex.
Divide the whole into several First-order element , if in each first-order element, a part is randomly selected Second-order element , from the population of these second-order elements Basic unit The sample is equivalent to stratified sampling in the way of sampling; If only some first-order units are selected from all first-order units, and all basic units in the selected first-order units are comprehensively investigated, this is Cluster sampling
Therefore, stratified sampling is actually a special two-stage sampling when the first level sampling ratio is 100%; Cluster sampling is actually a special two-stage sampling when the second order sampling ratio is 100%, so it is also called single-stage cluster sampling.

Main differences

Multistage sampling And stratified sampling The main differences are:
1、 Hierarchical sampling refers to the sampling of each level in the population sample The population shall be fully sampled, and then all samples shall be spot checked; In two-stage sampling, all groups in the population are regarded as first-order units. These first-order units are sampled, and the samples are sampled again (neither is a comprehensive survey) to generate two-level samples. Finally, the overall first-order sample indicators are comprehensively estimated.
II Cluster sampling Is included in each sample group sampled from the population Basic unit Conduct a comprehensive investigation; In two-stage sampling, all groups in the population are regarded as first-order units, and the secondary units (i.e. basic units) contained in each first-order unit selected are not investigated comprehensively, but conducted again sampling survey (also called pumping Subsample )。 That is, two-stage sampling, which generates two-stage samples, and finally comprehensively estimates the total primary sample indicators. As for the method of comprehensive estimation, Two-stage sampling It is also very similar to cluster sampling, except that the former is one level unit sample The indicators are comprehensively estimated, and the latter is the comprehensive estimation of all indicators of the sampled population unit. [2]

Stratified proportional sampling

Stratified proportional sampling It refers to allocating the sample quantity of each layer according to the proportion of the unit quantity of each layer in the total unit quantity of the survey. stay stratified sampling Medium, using Stratified proportional sampling It can improve the representativeness of the sample and determine the estimated value of the overall quantity index to avoid Simple random sampling Focusing on or omitting certain features from.