Normal distribution

Mathematical terminology
Collection
zero Useful+1
zero
synonym Normal distribution (Normal distribution) generally refers to normal distribution
Normal distribution, also known as "normal distribution" Gaussian distribution (Gaussian distribution), originally created by Desmoff (Abraham de Moivre) Binomial distribution Is obtained from the asymptotic formula of. C. F. Gauss derived it from another angle when studying the measurement error. P. S. Laplace and Gauss studied its properties. It's a mathematics Physics and engineering And other fields are very important probability distribution , on statistics Many aspects of effect
The normal curve is bell shaped, low at both ends, high in the middle, and symmetrical from left to right. Because its curve is bell shaped, it is often called Bell curve
if random variable X obeys one Mathematical expectation Is μ variance Is σ two The normal distribution of, denoted as N (μ, σ two )。 his probability density function Normal distribution expected value μ determines its position, which standard deviation σ determines the amplitude of the distribution. When μ=0, σ=1, the normal distribution is Standard normal distribution
Chinese name
Normal distribution
Foreign name
Normal Distribution
Applicable fields
probability theory
Discipline
mathematics
Alias
Gaussian distribution
Discoverer
Desmoff (Abraham de Moivre)

Historical development

Announce
edit
The concept of normal distribution was developed by France mathematician Desmoff (Abraham de Moivre) first proposed it in 1733, and later Gauss, a German mathematician, took the lead in applying it to astronomical research, so the normal distribution is also called Gauss distribution. Gauss's work has a great impact on later generations. He gave the normal distribution the name of "Gauss distribution" at the same time. The reason why later generations will least square method The right of invention belongs to him, also out of this work. [1] But Germany 10 mark The banknote with Gauss's head on it is also printed with normal distribution Density curve This conveys the idea that in all of Gauss's scientific contributions human civilization The most influential one is this one. At the beginning of Gauss's discovery, perhaps people could only evaluate its superiority from the simplification of its theory, and its full impact could not be fully seen. This is normal in the 20th century Small sample theory After full development. Laplace Soon learned about Gauss's work, and immediately linked it with what he found central limit theorem For this reason, he added a little supplement to a forthcoming article (published in 1810), pointing out that if the error can be seen as the superposition of many quantities, according to his central limit theorem, the error should have Gaussian distribution This is the first time in history that the so-called "meta error theory" is mentioned - error is the superposition of a large number of meta errors caused by various reasons. Later, in 1837, G. Hagen formally proposed this theory in a paper.
In fact, the form he proposed has considerable limitations: Hagen assumes that the error is large Independent identically distributed The sum of the "meta errors" of, each of which takes two values, its probability is 1/2. From this, according to Timofer's central limit theorem, it is immediately concluded that the error (approximately) obeys the normal distribution. The significance of this point pointed out by Laplace is that he gives a more natural, reasonable and convincing explanation to the normal theory of error. Because Gauss's statement has a smell of circular argument: because the arithmetic mean is excellent, the deduction error must obey the normal distribution; On the contrary, from the latter conclusion, the excellence of arithmetic mean and least square estimation is deduced, so one of them must be recognized( Arithmetic mean And the normality of errors). However, there is no reason for the arithmetic mean to be self established after all. Taking it as a preset starting point in the theory, we finally feel that it has its shortcomings. Laplace's theory is of great significance in connecting the broken links and making them a harmonious whole.

theorem

Announce
edit
Because the general normal population's image is not necessarily about y axial symmetry , for any normal population, its value is less than the probability of x. As long as you can use it to calculate the probability of a normal population in a specific interval.
For the convenience of description and application, normal variables are often transformed into data. The general normal distribution is transformed into the standard normal distribution. [2]
if
Obey the standard normal distribution, and the probability value of the original normal distribution can be directly calculated by looking up the standard normal distribution table. Therefore, this transformation is called standardization transformation. (Standard normal distribution table: the standard normal distribution table lists the area proportion within the range from - ∞ to X (current value) under the standard normal curve.)

definition

Announce
edit

One dimensional normal distribution

if random variable
Obey a position parameter of
, scale parameters are
The probability distribution of, and its probability density function by
Then this random variable Is called Normal random variable The distribution of normal random variables is called Normal distribution, recorded as
, read as
obey
, or
Obey normal distribution. [4 ]
μ - dimensional random vector When there is similar probability law, this random vector is said to follow multidimensional normal distribution. Multivariate normal distribution It has good properties, such as multivariate normal distribution Marginal distribution It is still normal distribution, and it passes any linear The transformed random vector is still multidimensional normal distribution, especially its linear combination It is a unitary normal distribution.
The normal distribution of this term is one-dimensional normal distribution. In addition, see“ Two-dimensional normal distribution ”。

Standard normal distribution

When
The normal distribution becomes Standard normal distribution

nature

Announce
edit
Some properties of normal distribution:
(1) If
And a and b are real number , then
(See expected value and variance )。
(2) If
And
yes Statistical independence Normality of random variable , then:
Their sum also satisfies normal distribution
Their difference also meets the normal distribution
U and V are independent of each other. (The variance of X and Y is required to be equal).
(3) If
and
Is an independent normal random variable, then:
Their product XY follows the distribution of probability density function p
among
Is a correction Bessel function (modified Bessel function)
Their ratio is consistent with Cauchy distribution , satisfied
(4) If
Is an independent standard normal random variable, then
Obey the degree of freedom as n Of Chi square distribution

distribution curve

Announce
edit

Graphic feature

Concentration : The peak of the normal curve is in the middle, that is Mean Location.
Symmetry : The normal curve takes the mean as the center, and is symmetrical from left to right. The two ends of the curve are never connected with each other Horizontal axis Intersection.
Uniform variability : The normal curve starts from the place where the mean is, and gradually drops evenly to the left and right.
The area between the curve and the horizontal axis is always equal to 1, which is equivalent to probability density function The probability of the integral of a function from positive infinity to negative infinity is 1. That is, the sum of frequencies is 100%.
For μ symmetry, take the maximum value at μ, take the value of 0 at positive (negative) infinity, and take the value at μ ± σ Inflection point , the shape is high in the middle and low on both sides, the probability density function of normal distribution curve It is bell shaped, so people often call it Bell curve.

Parameter meaning

Normal distribution has two parameters, namely expectation (mean) μ and standard deviation σ, σ two Is variance.
Normal distribution with two parameters μ and σ ^ 2 Continuous random variable The first parameter μ is a random variable subject to normal distribution mean value , the second parameter σ ^ 2 is the variance , so the normal distribution is recorded as N (μ, σ two )。
μ is the location parameter of the normal distribution, which describes the normal distribution Concentrated trend Location. The probability rule is that the probability of taking a value close to μ is greater, while the probability of taking a value farther away from μ is smaller. Normal distribution takes X=μ as Axis of symmetry , completely symmetrical from left to right. Expectation of normal distribution Mean median The mode is the same, which is equal to μ.
σ describes the dispersion degree of data distribution of normal distribution data. The larger the σ, the more dispersed the data distribution. The smaller the σ, the more centralized the data distribution. It is also called the shape parameter of normal distribution. The larger the σ, the flatter the curve. On the contrary, the smaller the σ, the thinner the curve.

Area distribution

Normal function Indefinite integral It is a non Elementary function , called error function
In fact, the derivative of the error function is:
The relationship between the error function and the "integration of normal function" is as follows:
1. In practice, the area of a certain interval on the lower horizontal axis of the normal curve (the difference between the upper and lower limits of the error function) reflects the percentage of the number of cases in the interval in the total number of cases, or the probability (probability distribution) that the variable value falls in the interval.
2. Under the normal curve, the probability of 50% should be taken, and the length of the horizontal axis half interval is 0.67448975 σ (This value cannot be solved by the elementary method, but is an approximate value obtained by the iterative method.)
Horizontal axis The area within the interval (μ - σ, μ+σ) is 68.268949%.
The area in the horizontal axis interval (μ - 2 σ, μ+2 σ) is 95.449974%.
The area in the horizontal axis interval (μ - 3 σ, μ+3 σ) is 99.730020%.
"Small probability event" and hypothesis test The basic idea of "small probability event" is usually an event with a probability of less than 5%, which is almost impossible to happen in a test. It can be seen that the probability of X falling outside (μ - 3 σ, μ+3 σ) is less than three thousandths. In practical problems, it is often believed that the corresponding event will not occur. Basically, the interval (μ - 3 σ, μ+3 σ) can be regarded as the actual possible value interval of random variable X, which is called the "3 σ" principle of normal distribution. For large-scale assembly line products with larger output and more test times, 4 σ (99.9936%) is required to achieve "foolproof" (99.99%), while for higher level, half range of 5 σ~6 σ length is required, and the error is about 0.6 ppm ~0.002ppm, which is proposed in industrial production“ Six Sigma (6 σ) "principle (the requirement of the Six Sigma principle mentioned in the management books is 3.4 ppm, and the distribution of this probability value is about 4.5 σ in the half interval length, taking into account the mean shift μ=1.5 σ caused by systematic error).

Research process

Announce
edit
Concept and characteristics:
1、 Concept of normal distribution
According to the histogram drawn from the frequency table data of general distribution, as shown in Figure (1), the peak is located in the middle, and the left and right sides are roughly symmetrical. We
Normal distribution study Figure 1
It is envisaged that if the number of observation cases increases gradually and the segments are subdivided continuously, the line at the top of the histogram will gradually form a smooth curve (3) with a peak in the center (where the mean is located), the two sides gradually decreasing and symmetrical, not intersecting with the horizontal axis. This curve is called frequency Curve or Frequency curve , which is similar to the normal distribution in mathematics. Since the sum of frequencies is 100% or 1 Horizontal axis The area on is 100% or 1.
For the convenience of application, normal distribution variables X Make variable transformation.
This transformation makes the original normal distribution conversion by Standard normal distribution (standard normal distribution), also known as u Distribution. u Known as standard Normal variable Or standard normal Deviation (standard normal deviate)。
Normal Distribution Study Figure 2
Normal distribution study Figure 3
In practical work, it is often necessary to understand the normal curve Horizontal axis The area of a certain interval on the top accounts for the total area percentage So as to estimate the percentage of the number of cases in this interval in the total number of cases( Frequency distribution )Or the probability that the observed value falls in this interval. The area of a certain interval under the normal curve can be obtained through Table 1. For data with normal or near normal distribution, it is known that Mean And standard deviation, we can make approximate estimation of its frequency distribution.
 Normal distribution Normal distribution
Schedule 1
Attention shall be paid to Table 1: ① The area under the curve in the table is - ∞ to u Left cumulative area; ② When μ, σ and X When u=(X - μ)/σ u When μ and σ are unknown and the sample size is n When large enough, the sample mean X1 and standard deviation can be used S Substitute μ and σ respectively, and get the formula u=(X-X1)/S u Value, check the table again; ③ The area of the interval symmetric to 0 under the curve is equal, for example, the area of the interval (- ∞, -1.96) is equal to that of the interval (1.96, ∞), and the total area on the horizontal axis under the curve ④ is 100% or 1.
Figure 2 Area distribution of normal curve and standard normal curve
The application of normal distribution Some medical phenomena, such as the height of the qualitative population, the number of red blood cells, the amount of hemoglobin, cholesterol, etc., as well as the random errors in the experiment, present normal or near normal distribution; Although some data are skewed distribution, they can become normal or approximate normal distribution after data transformation, so they can be processed according to normal distribution law.
Normal distribution area diagram 1
Normal distribution area diagram 2
Difference and relation between general normal distribution and standard normal distribution
Normal distribution, also known as normal distribution, is a kind of probability distribution of continuous random variables. A large number of phenomena in nature, human society, psychology and education are distributed in a normal form, such as the level of ability and the quality of student performance. It has different distribution patterns depending on the size and unit of the mean and standard deviation of random variables. The standard normal distribution is a kind of normal distribution. Its mean and standard deviation are both fixed. The mean is 0 and the standard deviation is 1.

Curve application

Announce
edit

overview

1. Estimating frequency distribution As long as the mean and standard deviation of a variable subject to normal distribution are known, the frequency proportion within any range of values can be estimated according to the formula. [3]
2. Develop reference value range
(1) The normal distribution method is applicable to indicators subject to normal (or near normal) distribution and indicators subject to normal distribution after transformation.
(2) The percentile method is often used for indicators of skewness. Both unilateral and bilateral boundary values of the two methods in Table 3-1 should be mastered.
3. Quality control: In order to control the measurement (or experiment) error in the experiment, it is often used as the upper and lower warning values, and as the upper and lower control values. The basis for doing so is that the measurement (or experiment) error follows the normal distribution under normal conditions.
/4. Normal distribution is the theoretical basis of many statistical methods. Inspection variance analysis , related and regression analysis Etc statistical method It is required that the indexes analyzed should obey normal distribution. Although many statistical methods do not require the analysis indicators to follow the normal distribution, the corresponding statistic When the sample size is large, it is approximate to normal distribution, so when the sample size is large, these statistical inference The method is also based on normal distribution theory.

Frequency distribution

Example 1.10 The height (cm) of 100 18 year old male college students in a certain area was sampled in 1993, with the mean=172.70 cm and the standard deviation s=4.01 cm. ① It is estimated that the percentage of the total number of 18 year old male college students in the area whose height is less than 168 cm accounts for the total number of 18 year old male college students in the area; ② Calculate the actual percentage of 18 year old male college students in the total number of 18 year old male college students in the range of X+- 1s, X+- 1.96s and X+- 2.58s respectively, and compare with the theoretical percentage.
In this case, μ and σ are unknown but sample size n Larger, use the sample mean X and standard deviation S Substitute μ and σ respectively to obtain u Value, u =(168-172.70)/4.01=-1.17。 Check the area under the standard normal curve in the attached table, and find - 1.1 on the left side of the table and 0.07 on the top of the table. The intersection between the two is 0.1210%=12.10%. The 18 year old male college students in this area whose height is below 168cm account for 12.10% of the total. See Table 3 for other calculation results.
Table 3 Actual and theoretical distribution of height of 100 male college students aged 18
distribution
x+-s
Height range (cm)
Actual distribution
Number of people
Actual distribution
%
Theoretical distribution (%)
X+-1s
168.69~176.71
sixty-seven
sixty-seven
sixty-eight point two seven
X +-1.96s
164.84~180.56
ninety-five
ninety-five
ninety-five
X+-2.58s
162.35~183.05
ninety-nine
ninety-nine
ninety-nine

Research on comprehensive quality

Educational Statistics The statistical law shows that the intelligence level of students, including learning ability and practical ability, is normally distributed. Therefore, the normal distribution of test scores should basically follow the normal distribution. Examination analysis requires drawing a histogram of the distribution of students' scores, and measuring the degree of conformity with the normal distribution by "high in the middle, low at both ends". The evaluation criteria are as follows: the histogram of the examinee's score distribution basically shows a normal curve, which is good; if it shows a slightly positive (negative) state, it is medium; if it shows a serious skewness or irregularity, it is poor.
From the law of probability and statistics, it is correct that "the normal distribution of exam scores should basically follow the normal distribution". However, it is necessary to consider the difference between people and things, as well as the fact that education can make "random" be interfered with. It is biased to use the shape of curves or histograms to evaluate exam results. Many education experts (such as Gu Lingyuan in Shanghai, Bloom in the United States, etc.) have proved through practice that education can do a lot. Most students can pass the exam, and most students can get high scores. The exam score curve is slightly normal. However, under the influence of the standard of "high in the middle, low at both ends" for a long time, teachers' actions are limited and most students' confidence in learning well is suppressed. This is a big misunderstanding. Normally, a normal curve has an axis of symmetry. When the number of candidates for a certain score (or score segment) is the largest, the highest point of the corresponding curve is the vertex of the curve. This Fractional value stay Horizontal axis The corresponding point on the is connected with the vertex line segment Is the value of the normal curve Axis of symmetry The maximum number of candidates is the peak. We have noticed that the performance curve or histogram is rarely symmetrical in fact, and it is more appropriate to call it the peak line.

Medical reference value

Some medical phenomena, such as height, red blood cell count hemoglobin Quantity, and random error , showing normal or near normal distribution; Although some indicators (variables) are subject to Skewed distribution However, the new variables after data conversion can be subject to normal or approximate normal distribution, and can be processed according to the normal distribution law. Among them, the index that follows the normal distribution after logarithmic transformation is called the index that follows the logarithmic normal distribution.
Medical reference value The range is also called medical normal value range. It refers to the fluctuation range of anatomical, physiological, biochemical and other indicators of the so-called "normal person". When formulating the range of normal values, it is necessary to first determine a group of "normal people" with sufficient sample size. The so-called "normal people" do not mean "healthy people", but refer to the homogeneous population excluding diseases and related factors that affect the indicators under study; Secondly, appropriate percentile values should be selected according to the research purpose and use requirements, such as 80%, 90%, 95% and 99%, commonly 95%; Determine the unilateral or bilateral boundary value according to the actual use of the indicator, such as white blood cell If the count is too high or too low, it is abnormal. The bilateral boundary value must be determined, as in liver function High transaminase The unilateral upper boundary shall be determined if it is abnormal, and the unilateral lower boundary shall be determined if it is abnormal. In addition, appropriate calculation methods should be selected according to the distribution characteristics of the data. Common methods include:
(1) Normal distribution method: applicable to data with normal or near normal distribution.
Bilateral boundary value: X+- u (u) S One side upper bound: X+u (u) S , or unilateral lower boundary: X-u (u) S
(2) Lognormal distribution method: applicable to lognormal distribution data.
Bilateral boundary value: lg-1 [X (lgx)+- u (u) S (lgx)]; One side upper bound: lg-1 [X (lgx)+u (u) S (lgx)], or one side lower bound: lg-1 [X (lgx) - u (u) S (lgx)].
Common u Values can be found in Table 4 as required.
(3) Percentile method: commonly used for data with skewed distribution and data without exact values at one or both ends.
Bilateral boundary value: P 2.5 and P 97.5; One side upper boundary: P 95, or unilateral lower boundary: P 5。
Table 4 Common u Value table
Reference value range (%)
Unilateral
bilateral
eighty
zero point eight four two
one point two eight two
ninety
one point two eight two
one point six four five
ninety-five
one point six four five
one point nine six zero
ninety-nine
two point three two six
two point five seven six
Theoretical basis of statistics:
For example, t distribution, F distribution and distribution are derived from normal distribution, and u test is also based on normal distribution. In addition, t distribution Binomial distribution The limit of Poisson distribution is normal distribution. Under certain conditions, it can be treated according to the principle of normal distribution.
The most important distribution in probability theory
Normal distribution has an extremely wide practical background. The probability distribution of many random variables in production and scientific experiments can be approximately described by normal distribution. For example, the strength compressive strength , caliber, length and other indicators; The length, weight and other indicators of the same organism; Weight of the same seed; The error of measuring the same object; The deviation of the impact point along a certain direction; Annual precipitation of a region; And the velocity components of ideal gas molecules, and so on. Generally speaking, if a quantity is the result of many small independent random factors, then it can be considered as having a normal distribution (see central limit theorem )。 Theoretically, normal distribution has many good properties probability distribution It can be used to approximate; Some commonly used probability distributions are directly derived from it, such as Lognormal distribution T distribution , F distribution, etc.
Main connotation
Under the practical background of connecting nature, society and thinking, we take the nature of normal distribution as the basis Normal distribution curve And the area distribution map as a token (this map will emerge when we talk about normal distribution and normal distribution theory later), abstract and enhance, grasp the main philosophical connotation, and summarize the main connotation of normal distribution theory (normal philosophy) as follows:
Holism
Normal distribution enlightens us to look at things from a holistic perspective. "The overall concept of the system or the overall concept is the essence of the system concept." The normal distribution curve and the area distribution map are composed of three areas, namely, the base area, the negative area and the positive area. The proportion of each area is different. Only by looking at things as a whole can we see clearly the original appearance of things and get the fundamental characteristics of things. You can't see the forest for the trees, nor can you generalize. In addition, the whole is greater than the sum of parts. On the basis of analyzing each part and each level, we should also look at things from the whole, because the whole has different characteristics from each part. To view the world as a whole is to base on the base area and look at the negative and positive areas. We should see the main aspects as well as the secondary aspects. We should not only see the positive aspects but also see the negative side of things. We should also see the backward side of things as well as the forward side of things. One side view of things is bound to see Skewness Or abnormal things, not real things themselves.
Key points
The normal distribution curve and the area distribution map clearly show the key points, that is, the base area accounts for 68.27%, which is the main body and should be emphasized. In addition, 95% and 99% show the comprehensiveness of the normal distribution. To understand and transform the world, we must grasp the key points, because the key points are the main contradictions of things, which play a major and dominant role in the development of things. Only when we grasp the key points can we achieve the goal at one stroke. Things and phenomena are numerous and complicated. If we do not grasp the main contradiction in the myriad threads, we will fall into infinite triviality. Due to the relative limitation of our time and energy, we should focus on the key points in pursuit of efficiency. In the normal distribution, the base area occupies the main body and focus. If we combine 20/80 rule We can boldly regard the main area as the key point.
Development theory
Contact and development are the basic laws of the development and change of things. Everything has its history of emergence, development and extinction. If we regard normal distribution as the development process of any system or thing, we can clearly see that this process is going through a process from negative area to base area and then to positive area. Whether it is natural, social or human thinking has obviously followed this process. Accurately grasping the historical process and stage of things or events will greatly help us to grasp the characteristics and nature of things and events, which is an important basis and basis for us to analyze problems, take countermeasures and solve problems. The nature and characteristics of development are different at different stages. The methods of analyzing and solving problems should be adapted to this Specific analysis of specific problems It is also the essence of emancipating the mind, seeking truth from facts and keeping pace with the times. The characteristics of normal development also enlighten us that the development of things is mostly gradual and cumulative, and the path of gradual development is the normal state of things. For example, heredity is normal and variation is abnormal.
In short, the normal distribution theory is a scientific world view, a scientific methodology, one of the most important and fundamental tools for us to understand and transform the world, and has important guiding significance for our theory and practice. Understanding the world with normal philosophy can better understand and grasp the nature and laws of the world, transform the world with normal philosophy, better respect and use objective laws, and more effectively transform the world.
Francis Galton [Francis Galton 1822.02.16 - 1911.01.17], British explorer, eugenist, psychologist, father of differential psychology, is also Psychometrics Founder of upper physiological metrology.
Galton The contribution to psychology can probably be summarized Differential Psychology Quantification of psychometrics and experimental psychology:
The quantification of psychological research originated from Galton. He invented many sensory and sports tests, and represented the differences in measured psychological characteristics with numbers. He believed that all the characteristics of human beings, whether material or spiritual, could ultimately be described quantitatively, which was a necessary condition for the realization of human science. Therefore, he was the first to apply statistical methods to deal with psychological research data, and paid attention to the average and high school difference of data. He collected a lot of data to prove that the distribution of human psychological traits in the population is as consistent as height and weight Normal distribution curve When he talked about the influence of heredity on individual differences correlation coefficient The concept of. For example, he studied the relationship between the "intermediate parent" and the height of their adult children, and found that there was a positive correlation between the intermediate parent and the height of their children, that is, the height of parents was higher, and the height of their children was also higher. On the contrary, if the parents' stature is lower, their children also tend to be shorter. At the same time, it is found that the height of children is often slightly different from that of their parents, and there is a trend of "going back to the middle", that is, leaving their parents' height to return to the average height average
Intelligence and ability
Richard J. Herrnstein [(1930.05.20-1994.09.13), an American comparative psychologist] and Charles Murray co wrote the book Normal Curve, which is famous for pointing out that people's intelligence Normal distribution Intelligence is mainly inherited and varies according to race. Jews and East Asians have the highest IQ, followed by whites, and blacks and Hispanics have the worst performance. They reviewed decades of research results in psychometrics and policy science, and found that American society ignored the trend of increasing influence of IQ. They tried to prove that the current social policies of the United States, such as vocational training and college education, which are biased towards low-income groups dominated by African Americans and South Americans, are a waste of resources. They used the test results of recruits to prove that the intelligence of black youth was lower than that of white and yellow race Moreover, the intelligence of these people has been stereotyped, and their training has little effect. Therefore, the government should give up education for these people and spend money on enlightenment education, including all races, because children's intelligence has not yet been shaped and their development potential is great. As the book involved the intellectual problems of blacks, it was besieged from all sides once it was published.