Collection
zero Useful+1
zero

statistical distribution

Statistical research methods
open 2 entries with the same name
statistical distribution (frequency distribution)“ Frequency distribution ”。 On the basis of statistical grouping, all units in the population are classified and sorted by group to form the distribution of overall units among groups. The number of units distributed in each group is called frequency or frequency The ratio of the number of times in each group to the total number of times (total number of units) is called ratio or frequency The sequence of groups and times is called Statistical distribution sequence , short for Distributed sequence or Distributive sequence It can reflect the distribution status and distribution characteristics of all units in the population among groups. Studying this distribution characteristics is an important part of statistical analysis. Statistical distribution and its distribution sequence can be expressed in tables or graphs [1]
Chinese name
statistical distribution
Foreign name
frequency distribution
Alias
Frequency distribution
Discipline
Mathematics (Statistics)
Related concepts
Statistical distribution sequence, bell distribution, etc

Significance of statistical distribution

Announce
edit
On the basis of statistical grouping, all units of the population are grouped and arranged by groups to form the distribution of units of each group in the population, called statistical distribution The essence of statistical distribution is a series of numbers formed by the distribution of all units of the whole population according to the groups divided by a certain mark, also known as Distributive sequence or Distributed sequence When a unit is allocated to a group, people often say that it is allocated once, so the allocation sequence is also called Degree distribution The distributive sequence has two elements: one is the group of the whole according to a certain mark; The second is the number of units corresponding to each group - times.
The form of statistical distribution is very simple, but it has important significance in statistical research. Statistical distribution is an important form of statistical analysis results and an important method of statistical analysis. It can show the distribution characteristics and structure of the overall units, and help us to further study the composition, average level and change law of the signs. From the literal meaning, the statistical distribution is more theoretical, and the distribution sequence is more popular. The two nouns are used interchangeably below [2]

Types and characteristics of statistical distribution

Announce
edit

Type of distribution sequence

The distribution sequence has two components. That is, the group divided by a certain mark and the corresponding frequency or frequency of each group.
The first constituent element of the distributive sequence is the group of the whole according to a certain mark. According to the different grouping marks, the allocation sequence can be divided into quality allocation sequence and variable allocation sequence. The distribution sequence formed by grouping quality marks is called Quality distribution sequence , short for Quality series The distribution sequence formed by grouping quantity marks is called Variable allocation sequence , short for Variable sequence Variable sequence can be divided into monomial sequence and interval sequence, and interval sequence can be divided into equidistant sequence and unequal interval sequence. They are all formed by corresponding statistical groups.
As for the quality series, it is clear to distinguish various types of things with quality marks, so the quality series is generally stable and can better reflect the distribution characteristics of the overall units. However, for variable number series, because the difference in the nature of things is expressed by quantitative boundaries, which are often affected by people's subjective understanding, the same quantitative marker group may have multiple distribution states. This involves the frequency and frequency of each group.

Frequency and frequency

The second element of the distribution sequence is the number of units corresponding to each group - frequency, also called frequency
express. The proportion of the number of units in each group in the total number of units
express. The frequency of each group should be greater than 0 and less than 1, that is
, the total frequency of all groups must be equal to 1, that is
The group (or the Group median The frequency distribution formed by the frequency corresponding to each group is also a statistical distribution, which plays the same role as the frequency distribution. The number distribution and frequency distribution are both distributive sequences.
In the variable allocation sequence, the frequency or frequency indicates the degree of action of the corresponding group flag value. The larger the frequency or frequency value is, the greater the role of this group of flag values on the overall level is; On the contrary, the smaller the frequency or frequency value, the smaller the role of this group of flag values on the overall level.
The frequency or frequency of each group in the allocation number column cannot be 0. If the frequency or frequency of a group is 0, this group should be deleted.
Sometimes, in order to more easily summarize the distribution characteristics of the overall units, it is also necessary to prepare Cumulative frequency series and Cumulative frequency series Accumulation methods include Cumulative up and Accumulate downward Two.
Upward accumulation refers to the accumulation towards the upper limit of the variable. It refers to the accumulation of the frequency or frequency of each group from the group with lower variable value to the group with higher variable value. The meaning of each cumulative number is the cumulative frequency or frequency below the upper limit of each group. When we pay attention to the distribution of groups with small flag values, we can use the upward accumulation method.
Downward accumulation refers to the accumulation towards the lower limit of the variable. It refers to the accumulation of the frequency or frequency of each group from the group with higher variable value to the group with lower variable value. The meaning of each cumulative number is the cumulative frequency or frequency above the lower limit of each group. When we pay attention to the distribution of groups with large flag values, we can use the downward accumulation method.
The distribution of variables shall generally be analyzed by equidistant series. At this point, the frequency or frequency of each group can well reflect the distribution of variables. If it is a non equidistant sequence, the frequency density or frequency density of each group should be used to correctly reflect the distribution of variables. The calculation formula of the times density and frequency density is as follows:
Times density=times of a group/distance between groups; Frequency density=a group of frequencies/group spacing

Characteristics of degree distribution

The overall nature of socio-economic phenomena is different, and the characteristics of their frequency distribution are also different. The overall frequency distribution of various socio-economic phenomena can be summarized as follows: Bell distribution U-shaped distribution J-distribution and Lorentz distribution Four types.
Bell distribution
The bell shaped distribution is Normal distribution It is commonly known as "high in the middle, low at both ends", that is, variable values near the middle are distributed more frequently than those near both sides, which is like an ancient bell (see Figure 1).
Figure 1 Bell distribution
In social and economic phenomena, bell shaped distribution is mostly symmetrical. The characteristic of symmetric distribution is that the number of times of distribution of variable values in the middle is the most, and the center of the marker variable is the axis of symmetry. The number of times of variable value distribution on both sides decreases gradually with the increase of the distance from the central variable value, and it is symmetrically distributed around the central variable value on both sides. This distribution is called in statistics Normal distribution In social and economic phenomena, the distribution of many variables is similar to the normal distribution type. Such as the annual income of employees, crop yield, part size, student exam results, social wealth distribution, etc. Normal distribution Socio economic statistics Is of great significance. This is because, on the one hand. In the social economic phenomenon, most of the distributions are approximately normal distribution; On the other hand, the normal distribution theory is the basis of sampling inference.
U-shaped distribution
The characteristics of the U-shaped distribution are just opposite to the bell shaped distribution. The number of times of variable value distribution near the middle is less, and the number of times of variable value distribution near both ends is more, forming a U-shaped distribution of "high at both ends, low in the middle". This is true, for example, of the distribution of population deaths by age. As there are more deaths of young children and the elderly in the population, and fewer deaths of middle-aged people, the number of deaths by age group shows a U-shaped distribution, as shown in Figure 2.
Fig. 2 U-shaped distribution
J-distribution
In social and economic phenomena, some statistical population distribution curves are J-shaped, that is, the number of times increases with the increase of variable values. For example, crop output is distributed by land area, population by retail sales, workers by gross output value, and inventory by inventory cost, as shown in Figure 3. There are also inverted J-shaped distributions whose degree decreases with the increase of variable values. For example, the number of enterprises is distributed according to the amount of investment, and the number of population is distributed according to age, as shown in Figure 4.
Figure 3 J-type distribution
Figure 4 Inverse J-shaped distribution
Lorentz distribution
The Lorenz distribution curve was proposed by M. Lorenz, an American statistician, to study the equality of social income distribution.
In Figure 5. The horizontal axis OH represents the cumulative percentage of population, the vertical axis OM represents the cumulative percentage of income, and the arc OL represents the Lorenz curve. The curvature of Lorenz curve is of great significance, which reflects the inequality of income distribution. The greater the degree of curvature, the more unequal the income distribution, and vice versa.
The part A between Loren chord curve and diagonal is called“ Unequal area ”, the area of right triangle OHL (A+B) is called“ Completely unequal area ”。 The ratio of unequal area to completely unequal area is Gini coefficient , also known as concentration factor : Gini coefficient=
Figure 5 Lorentz distribution
Gini coefficient is equal to 1, indicating absolute inequality in income distribution; The Gini coefficient is equal to 0, indicating absolute equality in income distribution. Gini coefficient is one of the standards to measure the gap between rich and poor in a country or region. According to the regulations of relevant United Nations organizations, if the Gini coefficient is lower than 0.2, it means the average income; 0.2-0.3 means relatively average; 0.3=0.4 means relatively reasonable; 0.4-0.5 indicates a large income gap; 0.5 or more indicates a wide income gap. Generally, 0.4 is regarded as the "warning line" of income distribution gap. The Gini coefficient of developed countries is between 0.26 and 0.38, and the Gini coefficient of China's national residents' income in 2013 was 0.473.
Lorenz Curve The expansion of can be applied to other social and economic phenomena to study the variation of the overall indicators of each unit - the uniformity of the distribution of variables or the concentration of the distribution. Therefore, the Lorenz curve is also called the concentration curve. For example, the concentration and analysis of product market share in each enterprise Investment in fixed assets Concentration in various regions, etc [2]

Preparation of distribution sequence

Announce
edit
(I) Reorder the original data according to its numerical value
Only by rearranging the obtained raw data according to their numerical values can we see the centralized trend and characteristics of variable distribution and make preparations for determining the full distance, group distance and group number.
(II) Determine full range
determine Full range Before, check whether there are extreme values at both ends of the data set. If there are extreme values and the number is small, the extreme values should be considered to be included in the open group, and the extreme values can be removed before calculating the full distance.
(III) Determine group spacing and number of groups
Group distance=full distance/number of groups, when the full distance is fixed. The larger the group spacing, the fewer groups; The smaller the group spacing, the more groups. In practical applications. The group spacing shall generally be an integer, preferably an integer multiple of 5 or 10.
(IV) Determine group limit
The group limit shall be determined according to the nature of the variable. If the variable values are relatively concentrated and there is no extreme value of extra large or extra small value, the closed type is adopted; on the contrary, if there is extreme value of extra large or extra small value, the open type is adopted, and the extreme mosquitoes are included in the open group.
(V) Compile variable series
After the above four steps, each unit of the population can be allocated to each group according to the size of the variable value, and the number and frequency of each group can be calculated [2]