variance

[fāng chā]
Mathematical terminology
Collection
zero Useful+1
zero
Variance is measured in probability theory and statistical variance random variable Or a group of data. Variance is used to measure in probability theory random variable And its Mathematical expectation (i.e mean value )The degree of deviation between. The variance in statistics (sample variance) is the square of the difference between each sample value and the average of all sample values average In many practical problems, it is of great significance to study variance, that is, deviation degree.
Variance is a measure of the difference between the source data and the expected value.
Chinese name
variance
Foreign name
variance/deviation Var
Type
D (X) Mathematics (Statistics)
researcher
Ronald Fisher (Ronald Fisher)
Definition
The sum of squares of the difference between the data and the average average
Category
Discrete variance

history

Announce
edit
The word "variance" was first used by Ronald Fisher (Ronald Fisher) paper The Correlation Between Relatives on the Supposition of Mendelian Inheritance [1] Proposed in.

definition

Announce
edit
Variance has different definitions and formulas in statistical description and probability distribution.
In statistical description, variance is used to calculate the difference between each variable (observation value) and the overall mean. In order to avoid that the total deviation from the mean is zero, and the sum of squares of the deviation from the mean is affected by the sample size, statistics uses the sum of squares of the average deviation from the mean to describe the variation degree of variables. population Variance calculation formula
Is the population variance,
Is a variable,
Is the overall mean,
Is the total number of cases.
In practical work, when the overall mean is difficult to obtain, the sample statistics shall be used to replace the overall parameters. After correction, the sample variance calculation formula is as follows:
[2]
Is the sample variance,
Is a variable,
Is the sample mean,
Is the number of samples.
In the probability distribution, let
Is a discrete type random variable , if E ((X-E (X)) two )If it exists, it is called E ((X-E (X)) two )For
Variance of, recorded as
or
, where
yes
The expected value of,
Is a variable value [1] , in the formula
Is the abbreviation of expected value, which means the expected value of "the square of the difference between the value of a random variable and its expected value". [2] Discrete random variable Variance calculation formula:
When
Called variable
The variance of, and
It is called standard deviation (or Mean square error )。 It is related to
They have the same dimensions. Standard deviation is used to measure the dispersion of a group of data statistic [3]
about Continuous random variable
, if its domain is
probability density function by
, X variance calculation formula of continuous random variable [2]
Variance describes how the value of a random variable affects its mathematical expectation Degree of dispersion (The greater the standard deviation and variance, the greater the degree of dispersion)
If the value of X is relatively centralized, then the variance
Smaller, if the value of X is scattered, then the variance
Larger.
Therefore,
It's a portrayal
Value dispersion is a measure of value dispersion.

nature

Announce
edit
1. Set
yes constant , then
2. Set
yes random variable
Is a constant, then there is
3. Set
And
Is two random variables, then
Where covariance
In particular, when X and Y are two unrelated random variables
This property can be extended to the case of a finite number of uncorrelated random variables.
4、
Of Sufficient and necessary conditions yes
Take constant with probability 1
, i.e
(If and only if X takes a constant value
When the probability is 1,
。)
Note: cannot be obtained
Identical to a constant when
When it is continuous, X can be taken as a constant at any finite point
The value of.
5、
prove
1、
2、
3、
The third item at the right end of the above formula is.
If X and Y are independent of each other Mathematical expectation We know that the above formula is 0.
4. Adequacy:
, there is
Necessity: The probability will not be greater than 1 by using the method of contradiction, and only need to consider whether it is equal to or less than 1.
hypothesis
, then for a certain number
But by Chebyshev inequality , when
, satisfied
Contradiction with the above formula.
therefore
[4]

Category and calculation

Announce
edit

Discrete variance

The calculation formula of discrete variance is:
, where
After expanding the above formula, we can get:

Continuous variance

The calculation formula of continuous variance is:
, where
Expand the above formula to get:
The above two expressions are the same, but they are written in different ways.
Proof: obtained from the property of mathematical expectation

Expectations and Variances

Announce
edit

Discrete

X obeys two-point distribution, then
X obey Hypergeometric distribution , i.e
, then
X obey Binomial distribution , i.e
, then
X obey Poisson distribution , i.e
, then

Continuous

X obey uniform distribution , i.e
, then
X obey exponential distribution , i.e
, then
X obey Normal distribution , i.e
, then
X obey Standard normal distribution , i.e
, then
Finding the mathematical expectation&variance of normal distribution
set up
, seek
.
order
, due to
, so
, known
, thus

Example

Announce
edit
It is known that the true length of a part is a, and now the two instruments A and B are used to measure 10 times respectively. The measurement result X is represented by a point on the coordinate as shown in Figure 1:
Measurement results of instrument A:
Measurement results of instrument B: all a
The mean value of the measurement results of both instruments is a. But if we use the above results to evaluate the advantages and disadvantages of the two instruments, it is obvious that we will think that the performance of instrument B is better, because the measurement results of instrument B are concentrated around the mean value.
Thus, it is necessary to study the deviation degree between random variables and their mean values. So, how to measure the degree of deviation? It is easy to see that E [| X-E [X] |] can measure the deviation degree of random variables from their mean E (X). However, because the above formula has an absolute value, the calculation is not convenient. Usually, the amount is E [(X-E [X]) two ]This number is characterized by variance.
Figure 1 Measurement Results

formula

Announce
edit
Variance is the difference between actual value and expected value Difference square Of average value , and standard deviation Is variance arithmetic square root [5] In actual calculation, we use the following formula to calculate the variance.
Variance is the difference between each data and average The average of the sum of the squares of the difference, that is
, where x represents sample The average number of, n is the number of samples, x i Represents an individual, while s two Is the variance.
And used
As the estimation of the variance of sample X, it is found that its mathematical expectation is not the variance of X, but the variance of X
Times,
The mathematical expectation of is the variance of X, which is used as the estimate of the variance of X“ Unbiasedness ”, so we always use
To estimate the variance of X and call it“ Sample variance ”。
Variance is the degree of deviation from the center, which is used to measure the fluctuation of a batch of data (that is, the deviation of the batch of data from the average). It is called the variance of this group of data and recorded as S two stay sample size In the same case, the larger the variance, the greater the volatility of the data, and the more unstable it is.
The formula can be further deduced as:
Where x is the data in this group of data, and n is an integer greater than 0.
variance

statistical significance

Announce
edit
When the data distribution is relatively scattered (that is, the data fluctuates greatly near the average), the square sum of the differences between each data and the average is larger, and the variance is larger; When the data distribution is relatively centralized, the sum of squares of the difference between each data and the average is small. Therefore, the greater the variance, the greater the volatility of the data; The smaller the variance, the smaller the fluctuation of the data. [6]
The data in the sample is compared with Average number of samples The average of the sum of squares of the difference between is called the sample variance; Sample variance arithmetic square root It's called sample standard deviation Both sample variance and sample standard deviation measure the fluctuation of a sample. The larger the sample variance or sample standard deviation, the greater the fluctuation of sample data.
Variance and standard deviation are the most important and commonly used indicators for measuring discrete trends. Variance is the difference between the value of each variable and mean value Deviation The average of squares, which is a measure Numerical data Degree of dispersion The most important method. standard deviation Is the arithmetic square root of variance, represented by S. The corresponding calculation formula of variance is:
The difference between standard deviation and variance is that the calculation unit of standard deviation and variable is the same, which is clearer than the variance. Therefore, we often use standard deviation in our analysis.

Recent developments

Announce
edit
Variance not only expresses the degree of deviation from the mean of the sample, but also reveals the degree of mutual fluctuation within the sample. It can also be understood that variance represents the expectation of mutual fluctuation of the sample. Of course, this conclusion holds under the second-order statistical moment. [7]