Normal distribution, also known as "normal distribution"Gaussian distribution(Gaussian distribution), originally created byDesmoff(Abraham de Moivre)Binomial distributionIs obtained from the asymptotic formula of.C. F. Gauss derived it from another angle when studying the measurement error.P. S. Laplace and Gauss studied its properties.It's amathematics、PhysicsandengineeringAnd other fields are very importantprobability distribution, onstatisticsMany aspects ofeffect。
The normal curve is bell shaped, low at both ends, high in the middle, and symmetrical from left to right. Because its curve is bell shaped, it is often calledBell curve。
The concept of normal distribution was developed by FrancemathematicianDesmoff(Abraham de Moivre) first proposed it in 1733, and later Gauss, a German mathematician, took the lead in applying it to astronomical research, so the normal distribution is also called Gauss distribution. Gauss's work has a great impact on later generations. He gave the normal distribution the name of "Gauss distribution" at the same time. The reason why later generations willleast square methodThe right of invention belongs to him, also out of this work.[1]But Germany 10markThe banknote with Gauss's head on it is also printed with normal distributionDensity curve。This conveys the idea that in all of Gauss's scientific contributionshuman civilizationThe most influential one is this one.At the beginning of Gauss's discovery, perhaps people could only evaluate its superiority from the simplification of its theory, and its full impact could not be fully seen.This is normal in the 20th centurySmall sample theoryAfter full development.LaplaceSoon learned about Gauss's work, and immediately linked it with what he foundcentral limit theorem For this reason, he added a little supplement to a forthcoming article (published in 1810), pointing out that if the error can be seen as the superposition of many quantities, according to his central limit theorem, the error should haveGaussian distribution。This is the first time in history that the so-called "meta error theory" is mentioned - error is the superposition of a large number of meta errors caused by various reasons.Later, in 1837, G. Hagen formally proposed this theory in a paper.
In fact, the form he proposed has considerable limitations: Hagen assumes that the error is largeIndependent identically distributedThe sum of the "meta errors" of, each of which takes two values, its probability is 1/2. From this, according to Timofer's central limit theorem, it is immediately concluded that the error (approximately) obeys the normal distribution.The significance of this point pointed out by Laplace is that he gives a more natural, reasonable and convincing explanation to the normal theory of error.Because Gauss's statement has a smell of circular argument: because the arithmetic mean is excellent, the deduction error must obey the normal distribution;On the contrary, from the latter conclusion, the excellence of arithmetic mean and least square estimation is deduced, so one of them must be recognized(Arithmetic meanAnd the normality of errors).However, there is no reason for the arithmetic mean to be self established after all. Taking it as a preset starting point in the theory, we finally feel that it has its shortcomings.Laplace's theory is of great significance in connecting the broken links and making them a harmonious whole.
theorem
Announce
edit
Because the general normal population's image is not necessarily about yaxial symmetry, for any normal population, its value is less than the probability of x.As long as you can use it to calculate the probability of a normal population in a specific interval.
For the convenience of description and application, normal variables are often transformed into data.The general normal distribution is transformed into the standard normal distribution.[2]
if
Obey the standard normal distribution, and the probability value of the original normal distribution can be directly calculated by looking up the standard normal distribution table.Therefore, this transformation is called standardization transformation.(Standard normal distribution table: the standard normal distribution table lists the area proportion within the range from - ∞ to X (current value) under the standard normal curve.)
Then thisrandom variableIs calledNormal random variableThe distribution of normal random variables is calledNormal distribution, recorded as, read asobey, orObey normal distribution.[4]
μ - dimensional randomvectorWhen there is similar probability law, this random vector is said to follow multidimensional normal distribution.Multivariate normal distributionIt has good properties, such as multivariate normal distributionMarginal distributionIt is still normal distribution, and it passes anylinearThe transformed random vector is still multidimensional normal distribution, especially itslinear combinationIt is a unitary normal distribution.
(4) IfIs an independent standard normal random variable, thenObey the degree of freedom asnOfChi square distribution。
distribution curve
Announce
edit
Graphic feature
Concentration: The peak of the normal curve is in the middle, that isMeanLocation.
Symmetry: The normal curve takes the mean as the center, and is symmetrical from left to right. The two ends of the curve are never connected with each otherHorizontal axisIntersection.
Uniform variability: The normal curve starts from the place where the mean is, and gradually drops evenly to the left and right.
The area between the curve and the horizontal axis is always equal to 1, which is equivalent toprobability density functionThe probability of the integral of a function from positive infinity to negative infinity is 1.That is, the sum of frequencies is 100%.
For μ symmetry, take the maximum value at μ, take the value of 0 at positive (negative) infinity, and take the value at μ ± σInflection point, the shape is high in the middle and low on both sides, the probability density function of normal distributioncurveIt is bell shaped, so people often call itBell curve.
Parameter meaning
Normal distribution has two parameters, namely expectation (mean) μ and standard deviation σ, σtwoIs variance.
Normal distribution with two parameters μ and σ ^ 2Continuous random variableThe first parameter μ is a random variable subject to normal distributionmean value, the second parameter σ ^ 2 is thevariance, so the normal distribution is recorded as N (μ, σtwo)。
μ is the location parameter of the normal distribution, which describes the normal distributionConcentrated trendLocation.The probability rule is that the probability of taking a value close to μ is greater, while the probability of taking a value farther away from μ is smaller.Normal distribution takes X=μ asAxis of symmetry, completely symmetrical from left to right.Expectation of normal distributionMean、medianThe mode is the same, which is equal to μ.
σ describes the dispersion degree of data distribution of normal distribution data. The larger the σ, the more dispersed the data distribution. The smaller the σ, the more centralized the data distribution.It is also called the shape parameter of normal distribution. The larger the σ, the flatter the curve. On the contrary, the smaller the σ, the thinner the curve.
The relationship between the error function and the "integration of normal function" is as follows:
1. In practice, the area of a certain interval on the lower horizontal axis of the normal curve (the difference between the upper and lower limits of the error function) reflects the percentage of the number of cases in the interval in the total number of cases, or the probability (probability distribution) that the variable value falls in the interval.
2. Under the normal curve, the probability of 50% should be taken, and the length of the horizontal axis half interval is 0.67448975 σ (This value cannot be solved by the elementary method, but is an approximate value obtained by the iterative method.)
Horizontal axisThe area within the interval (μ - σ, μ+σ) is 68.268949%.
The area in the horizontal axis interval (μ - 2 σ, μ+2 σ) is 95.449974%.
The area in the horizontal axis interval (μ - 3 σ, μ+3 σ) is 99.730020%.
"Small probability event"andhypothesis testThe basic idea of "small probability event" is usually an event with a probability of less than 5%, which is almost impossible to happen in a test.It can be seen that the probability of X falling outside (μ - 3 σ, μ+3 σ) is less than three thousandths. In practical problems, it is often believed that the corresponding event will not occur. Basically, the interval (μ - 3 σ, μ+3 σ) can be regarded as the actual possible value interval of random variable X, which is called the "3 σ" principle of normal distribution.For large-scale assembly line products with larger output and more test times, 4 σ (99.9936%) is required to achieve "foolproof" (99.99%), while for higher level, half range of 5 σ~6 σ length is required, and the error is about 0.6ppm~0.002ppm, which is proposed in industrial production“Six Sigma(6 σ) "principle (the requirement of the Six Sigma principle mentioned in the management books is 3.4 ppm, and the distribution of this probability value is about 4.5 σ in the half interval length, taking into account the mean shift μ=1.5 σ caused by systematic error).
Research process
Announce
edit
Concept and characteristics:
1、 Concept of normal distribution
According to the histogram drawn from the frequency table data of general distribution, as shown in Figure (1), the peak is located in the middle, and the left and right sides are roughly symmetrical.We
Normal distribution study Figure 1
It is envisaged that if the number of observation cases increases gradually and the segments are subdivided continuously, the line at the top of the histogram will gradually form a smooth curve (3) with a peak in the center (where the mean is located), the two sides gradually decreasing and symmetrical, not intersecting with the horizontal axis.This curve is calledfrequencyCurve orFrequency curve, which is similar to the normal distribution in mathematics.Since the sum of frequencies is 100% or 1Horizontal axisThe area on is 100% or 1.
For the convenience of application, normal distribution variablesXMake variable transformation.
In practical work, it is often necessary to understand the normal curveHorizontal axisThe area of a certain interval on the top accounts for the total areapercentageSo as to estimate the percentage of the number of cases in this interval in the total number of cases(Frequency distribution)Or the probability that the observed value falls in this interval.The area of a certain interval under the normal curve can be obtained through Table 1.For data with normal or near normal distribution, it is known thatMeanAnd standard deviation, we can make approximate estimation of its frequency distribution.
Schedule 1
Attention shall be paid to Table 1: ① The area under the curve in the table is - ∞ touLeft cumulative area; ②When μ, σ andXWhen u=(X - μ)/σuWhen μ and σ are unknown and the sample size isnWhen large enough, the sample mean X1 and standard deviation can be usedSSubstitute μ and σ respectively, and get the formula u=(X-X1)/SuValue, check the table again; ③The area of the interval symmetric to 0 under the curve is equal, for example, the area of the interval (- ∞, -1.96) is equal to that of the interval (1.96, ∞), and the total area on the horizontal axis under the curve ④ is 100% or 1.
Figure 2 Area distribution of normal curve and standard normal curve
The application of normal distribution Some medical phenomena, such as the height of the qualitative population, the number of red blood cells, the amount of hemoglobin, cholesterol, etc., as well as the random errors in the experiment, present normal or near normal distribution;Although some data are skewed distribution, they can become normal or approximate normal distribution after data transformation, so they can be processed according to normal distribution law.
Normal distribution area diagram 1
Normal distribution area diagram 2
Difference and relation between general normal distribution and standard normal distribution
Normal distribution, also known as normal distribution, is a kind of probability distribution of continuous random variables. A large number of phenomena in nature, human society, psychology and education are distributed in a normal form, such as the level of ability and the quality of student performance.It has different distribution patterns depending on the size and unit of the mean and standard deviation of random variables.The standard normal distribution is a kind of normal distribution. Its mean and standard deviation are both fixed. The mean is 0 and the standard deviation is 1.
Curve application
Announce
edit
overview
1. Estimating frequency distribution As long as the mean and standard deviation of a variable subject to normal distribution are known, the frequency proportion within any range of values can be estimated according to the formula.[3]
2. Develop reference value range
(1) The normal distribution method is applicable to indicators subject to normal (or near normal) distribution and indicators subject to normal distribution after transformation.
(2) The percentile method is often used for indicators of skewness.Both unilateral and bilateral boundary values of the two methods in Table 3-1 should be mastered.
3. Quality control: In order to control the measurement (or experiment) error in the experiment, it is often used as the upper and lower warning values, and as the upper and lower control values.The basis for doing so is that the measurement (or experiment) error follows the normal distribution under normal conditions.
/4. Normal distribution is the theoretical basis of many statistical methods.Inspectionvariance analysis, related andregression analysisEtcstatistical methodIt is required that the indexes analyzed should obey normal distribution.Although many statistical methods do not require the analysis indicators to follow the normal distribution, the correspondingstatisticWhen the sample size is large, it is approximate to normal distribution, so when the sample size is large, thesestatistical inferenceThe method is also based on normal distribution theory.
Frequency distribution
Example 1.10 The height (cm) of 100 18 year old male college students in a certain area was sampled in 1993, with the mean=172.70 cm and the standard deviation s=4.01 cm. ① It is estimated that the percentage of the total number of 18 year old male college students in the area whose height is less than 168 cm accounts for the total number of 18 year old male college students in the area; ②Calculate the actual percentage of 18 year old male college students in the total number of 18 year old male college students in the range of X+- 1s, X+- 1.96s and X+- 2.58s respectively, and compare with the theoretical percentage.
In this case, μ and σ are unknown but sample sizenLarger, use the sample mean X andstandard deviationSSubstitute μ and σ respectively to obtainuValue,u=(168-172.70)/4.01=-1.17。Check the area under the standard normal curve in the attached table, and find - 1.1 on the left side of the table and 0.07 on the top of the table. The intersection between the two is 0.1210%=12.10%.The 18 year old male college students in this area whose height is below 168cm account for 12.10% of the total.See Table 3 for other calculation results.
Table 3 Actual and theoretical distribution of height of 100 male college students aged 18
distribution
x+-s
Height range (cm)
Actual distribution
Number of people
Actual distribution
%
Theoretical distribution (%)
X+-1s
168.69~176.71
sixty-seven
sixty-seven
sixty-eight point two seven
X +-1.96s
164.84~180.56
ninety-five
ninety-five
ninety-five
X+-2.58s
162.35~183.05
ninety-nine
ninety-nine
ninety-nine
Research on comprehensive quality
Educational StatisticsThe statistical law shows that the intelligence level of students, including learning ability and practical ability, is normally distributed.Therefore, the normal distribution of test scores should basically follow the normal distribution.Examination analysis requires drawing a histogram of the distribution of students' scores, and measuring the degree of conformity with the normal distribution by "high in the middle, low at both ends".The evaluation criteria are as follows: the histogram of the examinee's score distribution basically shows a normal curve, which is good; if it shows a slightly positive (negative) state, it is medium; if it shows a serious skewness or irregularity, it is poor.
From the law of probability and statistics, it is correct that "the normal distribution of exam scores should basically follow the normal distribution".However, it is necessary to consider the difference between people and things, as well as the fact that education can make "random" be interfered with. It is biased to use the shape of curves or histograms to evaluate exam results.Many education experts (such as Gu Lingyuan in Shanghai, Bloom in the United States, etc.) have proved through practice that education can do a lot. Most students can pass the exam, and most students can get high scores. The exam score curve is slightly normal.However, under the influence of the standard of "high in the middle, low at both ends" for a long time, teachers' actions are limited and most students' confidence in learning well is suppressed.This is a big misunderstanding.Normally, a normal curve has an axis of symmetry.When the number of candidates for a certain score (or score segment) is the largest, the highest point of the corresponding curve is the vertex of the curve.ThisFractional valuestayHorizontal axisThe corresponding point on the is connected with the vertexline segmentIs the value of the normal curveAxis of symmetry。The maximum number of candidates is the peak.We have noticed that the performance curve or histogram is rarely symmetrical in fact, and it is more appropriate to call it the peak line.
Medical reference value
Some medical phenomena, such as height, red blood cell counthemoglobinQuantity, andrandom error, showing normal or near normal distribution;Although some indicators (variables) are subject toSkewed distributionHowever, the new variables after data conversion can be subject to normal or approximate normal distribution, and can be processed according to the normal distribution law.Among them, the index that follows the normal distribution after logarithmic transformation is called the index that follows the logarithmic normal distribution.
Medical reference valueThe range is also called medical normal value range.It refers to the fluctuation range of anatomical, physiological, biochemical and other indicators of the so-called "normal person".When formulating the range of normal values, it is necessary to first determine a group of "normal people" with sufficient sample size. The so-called "normal people" do not mean "healthy people", but refer to the homogeneous population excluding diseases and related factors that affect the indicators under study;Secondly, appropriate percentile values should be selected according to the research purpose and use requirements, such as 80%, 90%, 95% and 99%, commonly 95%;Determine the unilateral or bilateral boundary value according to the actual use of the indicator, such aswhite blood cellIf the count is too high or too low, it is abnormal. The bilateral boundary value must be determined, as in liver functionHigh transaminaseThe unilateral upper boundary shall be determined if it is abnormal, and the unilateral lower boundary shall be determined if it is abnormal.In addition, appropriate calculation methods should be selected according to the distribution characteristics of the data.Common methods include:
(1) Normal distribution method: applicable to data with normal or near normal distribution.
Bilateral boundary value: X+- u (u)SOne side upper bound: X+u (u)S, or unilateral lower boundary: X-u (u)S
(2) Lognormal distribution method: applicable to lognormal distribution data.
Bilateral boundary value: lg-1 [X (lgx)+- u (u) S (lgx)];One side upper bound: lg-1 [X (lgx)+u (u) S (lgx)], or one side lower bound: lg-1 [X (lgx) - u (u) S (lgx)].
CommonuValues can be found in Table 4 as required.
(3) Percentile method: commonly used for data with skewed distribution and data without exact values at one or both ends.
Bilateral boundary value:P2.5 andP97.5;One side upper boundary:P95, or unilateral lower boundary:P5。
Table 4 CommonuValue table
Reference value range (%)
Unilateral
bilateral
eighty
zero point eight four two
one point two eight two
ninety
one point two eight two
one point six four five
ninety-five
one point six four five
one point nine six zero
ninety-nine
two point three two six
two point five seven six
Theoretical basis of statistics:
For example, t distribution, F distribution and distribution are derived from normal distribution, and u test is also based on normal distribution.In addition, t distributionBinomial distributionThe limit of Poisson distribution is normal distribution. Under certain conditions, it can be treated according to the principle of normal distribution.
The most important distribution in probability theory
Normal distribution has an extremely wide practical background. The probability distribution of many random variables in production and scientific experiments can be approximately described by normal distribution.For example, the strengthcompressive strength, caliber, length and other indicators;The length, weight and other indicators of the same organism;Weight of the same seed;The error of measuring the same object;The deviation of the impact point along a certain direction;Annual precipitation of a region;And the velocity components of ideal gas molecules, and so on.Generally speaking, if a quantity is the result of many small independent random factors, then it can be considered as having a normal distribution (seecentral limit theorem )。Theoretically, normal distribution has many good propertiesprobability distributionIt can be used to approximate;Some commonly used probability distributions are directly derived from it, such asLognormal distribution、T distribution, F distribution, etc.
Main connotation
Under the practical background of connecting nature, society and thinking, we take the nature of normal distribution as the basisNormal distribution curveAnd the area distribution map as a token (this map will emerge when we talk about normal distribution and normal distribution theory later), abstract and enhance, grasp the main philosophical connotation, and summarize the main connotation of normal distribution theory (normal philosophy) as follows:
Holism
Normal distribution enlightens us to look at things from a holistic perspective."The overall concept of the system or the overall concept is the essence of the system concept." The normal distribution curve and the area distribution map are composed of three areas, namely, the base area, the negative area and the positive area. The proportion of each area is different.Only by looking at things as a whole can we see clearly the original appearance of things and get the fundamental characteristics of things.You can't see the forest for the trees, nor can you generalize.In addition, the whole is greater than the sum of parts. On the basis of analyzing each part and each level, we should also look at things from the whole, because the whole has different characteristics from each part.To view the world as a whole is to base on the base area and look at the negative and positive areas.We should see the main aspects as well as the secondary aspects. We should not only see the positive aspects but also see the negative side of things. We should also see the backward side of things as well as the forward side of things.One side view of things is bound to seeSkewnessOr abnormal things, not real things themselves.
Key points
The normal distribution curve and the area distribution map clearly show the key points, that is, the base area accounts for 68.27%, which is the main body and should be emphasized. In addition, 95% and 99% show the comprehensiveness of the normal distribution.To understand and transform the world, we must grasp the key points, because the key points are the main contradictions of things, which play a major and dominant role in the development of things.Only when we grasp the key points can we achieve the goal at one stroke.Things and phenomena are numerous and complicated. If we do not grasp the main contradiction in the myriad threads, we will fall into infinite triviality.Due to the relative limitation of our time and energy, we should focus on the key points in pursuit of efficiency.In the normal distribution, the base area occupies the main body and focus.If we combine20/80 ruleWe can boldly regard the main area as the key point.
Development theory
Contact and development are the basic laws of the development and change of things.Everything has its history of emergence, development and extinction. If we regard normal distribution as the development process of any system or thing, we can clearly see that this process is going through a process from negative area to base area and then to positive area.Whether it is natural, social or human thinking has obviously followed this process.Accurately grasping the historical process and stage of things or events will greatly help us to grasp the characteristics and nature of things and events, which is an important basis and basis for us to analyze problems, take countermeasures and solve problems.The nature and characteristics of development are different at different stages. The methods of analyzing and solving problems should be adapted to thisSpecific analysis of specific problemsIt is also the essence of emancipating the mind, seeking truth from facts and keeping pace with the times.The characteristics of normal development also enlighten us that the development of things is mostly gradual and cumulative, and the path of gradual development is the normal state of things.For example, heredity is normal and variation is abnormal.
In short, the normal distribution theory is a scientific world view, a scientific methodology, one of the most important and fundamental tools for us to understand and transform the world, and has important guiding significance for our theory and practice.Understanding the world with normal philosophy can better understand and grasp the nature and laws of the world, transform the world with normal philosophy, better respect and use objective laws, and more effectively transform the world.
Francis Galton [Francis Galton 1822.02.16 - 1911.01.17], British explorer, eugenist, psychologist, father of differential psychology, is alsoPsychometricsFounder of upper physiological metrology.
GaltonThe contribution to psychology can probably be summarizedDifferential PsychologyQuantification of psychometrics and experimental psychology:
The quantification of psychological research originated from Galton.He invented many sensory and sports tests, and represented the differences in measured psychological characteristics with numbers.He believed that all the characteristics of human beings, whether material or spiritual, could ultimately be described quantitatively, which was a necessary condition for the realization of human science. Therefore, he was the first to apply statistical methods to deal with psychological research data, and paid attention to the average and high school difference of data.He collected a lot of data to prove that the distribution of human psychological traits in the population is as consistent as height and weightNormal distribution curve。When he talked about the influence of heredity on individual differencescorrelation coefficientThe concept of.For example, he studied the relationship between the "intermediate parent" and the height of their adult children, and found that there was a positive correlation between the intermediate parent and the height of their children, that is, the height of parents was higher, and the height of their children was also higher.On the contrary, if the parents' stature is lower, their children also tend to be shorter.At the same time, it is found that the height of children is often slightly different from that of their parents, and there is a trend of "going back to the middle", that is, leaving their parents' height to return to the average heightaverage。
Intelligence and ability
Richard J. Herrnstein [(1930.05.20-1994.09.13), an American comparative psychologist] and Charles Murray co wrote the book Normal Curve, which is famous for pointing out that people's intelligenceNormal distribution。Intelligence is mainly inherited and varies according to race. Jews and East Asians have the highest IQ, followed by whites, and blacks and Hispanics have the worst performance.They reviewed decades of research results in psychometrics and policy science, and found that American society ignored the trend of increasing influence of IQ.They tried to prove that the current social policies of the United States, such as vocational training and college education, which are biased towards low-income groups dominated by African Americans and South Americans, are a waste of resources.They used the test results of recruits to prove that the intelligence of black youth was lower than that of white andyellow race;Moreover, the intelligence of these people has been stereotyped, and their training has little effect.Therefore, the government should give up education for these people and spend money on enlightenment education, including all races, because children's intelligence has not yet been shaped and their development potential is great.As the book involved the intellectual problems of blacks, it was besieged from all sides once it was published.