Simple regression

1. Correlation analysis: the description and measurement of the linear relationship between two variables. The problems to be solved include

§ Is there a relationship between variables?

§ If there is a relationship, what is the relationship between them?

§ How strong are the variables?

§ Can the relationship between the variables reflected in the sample represent the relationship between the overall variables?

2. Regression analysis: starting from a group of sample data, determine the mathematical relationships between variables, conduct various statistical tests on the reliability of these relationships, and find out which variables have significant impact from many variables that affect a specific variable, and which ones do not have significant impact based on the value of one or several variables by using the obtained relationship? Or control the value of another specific variable, and give this? Or control accuracy

3. Difference between regression analysis and correlation analysis

In the correlation analysis, the variable x, the variable y, is in the equal status regression analysis, the variable y is called the dependent variable, in the explained status, and the variable x is called the independent variable? Change of dependent variable

The variables x and y involved in correlation analysis are random variables in regression analysis. The dependent variable y is a random variable, and the independent variable x can be a random variable or a non random deterministic variable

Correlation analysis mainly describes the closeness of the linear relationship between two variables Regression analysis can not only reveal the impact of variable x on variable y, but also can be performed by regression equation? And control

4. One variable linear regression model

The equation describing how dependent variable y depends on independent variable x and error term e is called regression model

The unitary linear regression model can be expressed as

y = b0 +b1 x + e

Y is the linear function (partial) of x plus the error term

The linear part reflects the change of y caused by the change of x

The error term e is a random variable

L reflects the influence of random factors on y except the linear relationship between x and y

L is the variability that cannot be explained by the linear relationship between x and y

B0 and b1 are called parameters of the model

5. Use regression equation? Pay attention to

1. When using regression equation to estimate or? Do not use an x value other than the sample data? Corresponding y value

2. Because in the linear regression analysis, it is always assumed that the relationship between the dependent variable y and the independent variable x is correctly expressed in a linear model. But in practical application, the relationship between them may be some kind of curve

3. At this time, we always assume that only a small segment of this curve is within the range of x measurement value. If the value range of x is between xL and xU, then we can estimate E (y) and? y。 If the estimated value obtained by using values other than xL and xU and? The value will be very poor

6. Sum of squares of deviations

Sum of squares (SST)

The total deviation of n observations reflecting the dependent variable and its mean

Sum of squares of regression (SSR)

Reflect the influence of the change of independent variable x on the change of dependent variable y value, or the change of y value caused by the linear relationship between x and y, also known as the interpretable sum of squares

Sum of squares of residuals (SSE)

It reflects the influence of other factors other than x on the value of y, also known as the unexplained sum of squares or residual sum of squares

7. Estimated standard error

Root mean square of the sum of the squares of the deviations between the actual observations and the regression estimates (degrees of freedom n-2)

Reflect the dispersion of actual observation values around the regression line

The estimation of the standard deviation s of the error term e is an estimator of the size of the random fluctuation of y after excluding the linear influence of x on y

Reflect the estimated regression equation? Y hour? Size of error

