• Excel data analysis regression. Nonlinear Regression in Excel

    IN Excel there is an even faster and convenient way build a graph linear regression(and even the main types of nonlinear regressions, as discussed below). This can be done as follows:

    1) select columns with data X And Y(they should be in that order!);

    2) call Chart Wizard and select in the group TypeSpot and immediately press Ready;

    3) without deselecting the diagram, select the main menu item that appears Diagram, in which you should select the item Add a trend line;

    4) in the dialog box that appears Trend line in the tab Type choose Linear;

    5) in the tab Options you can activate the switch Show equation in diagram, which will allow you to see the linear regression equation (4.4), in which the coefficients (4.5) will be calculated.

    6) In the same tab you can activate the switch Place the approximation reliability value (R^2) on the diagram. This value is the square of the correlation coefficient (4.3) and it shows how well the calculated equation describes the experimental dependence. If R 2 is close to unity, then the theoretical regression equation describes the experimental dependence well (the theory agrees well with experiment), and if R 2 is close to zero, then given equation not suitable for describing the experimental dependence (theory does not agree with experiment).

    As a result of performing the described actions, you will get a diagram with a regression graph and its equation.

    §4.3. Main types of nonlinear regression

    Parabolic and polynomial regression.

    Parabolic dependence of the value Y from the size X is called a dependence expressed by a quadratic function (2nd order parabola):

    This equation is called parabolic regression equation Y on X. Options A, b, With are called parabolic regression coefficients. Calculating parabolic regression coefficients is always cumbersome, so it is recommended to use a computer for calculations.

    Equation (4.8) of parabolic regression is a special case of a more general regression called polynomial. Polynomial dependence of the value Y from the size X is called a dependence expressed by a polynomial n-th order:

    where are the numbers and i (i=0,1,…, n) are called polynomial regression coefficients.

    Power regression.

    Power dependence of the value Y from the size X is called a dependency of the form:

    This equation is called power regression equation Y on X. Options A And b are called power regression coefficients.

    ln =ln a+ ln x. (4.11)

    This equation describes a straight line on a plane with logarithmic coordinate axes ln x and ln. Therefore, the criterion for the applicability of power regression is the requirement that the points of logarithms of empirical data ln x i and ln y i were closest to the straight line (4.11).

    Exponential regression.

    Indicative(or exponential) dependence of the value Y from the size X is called a dependency of the form:

    (or ). (4.12)

    This equation is called exponential equation(or exponential) regression Y on X. Options A(or k) And b are called exponential coefficients(or exponential) regression.

    If we take the logarithm of both sides of the power regression equation, we get the equation

    ln = x ln a+ln b(or ln = k x+ln b). (4.13)

    This equation describes linear dependence logarithm of one quantity ln from another quantity x. Therefore, the criterion for the applicability of power regression is the requirement that empirical data points of the same value x i and logarithms of another quantity ln y i were closest to the straight line (4.13).

    Logarithmic regression.

    Logarithmic dependence of the value Y from the size X is called a dependency of the form:

    =a+ ln x. (4.14)

    This equation is called logarithmic regression equation Y on X. Options A And b are called logarithmic regression coefficients.

    Hyperbolic regression.

    Hyperbolic dependence of the value Y from the size X is called a dependency of the form:

    This equation is called hyperbolic regression equation Y on X. Options A And b are called hyperbolic regression coefficients and are determined by the least squares method. Application of this method leads to the formulas:

    In formulas (4.16-4.17) the summation is carried out over the index i from one to the number of observations n.

    Unfortunately, in Excel there are no functions that calculate hyperbolic regression coefficients. In cases where it is not known that the measured quantities are related by inverse proportionality, it is recommended to look for a power regression equation instead of the hyperbolic regression equation, so in Excel there is a procedure for finding it. If a hyperbolic dependence is assumed between the measured quantities, then its regression coefficients will have to be calculated using auxiliary calculation tables and summation operations using formulas (4.16-4.17).

    The MS Excel package allows you to do most of the work very quickly when constructing a linear regression equation. It is important to understand how to interpret the results obtained.

    Requires an add-on to work Analysis package, which must be enabled in the menu item Service\Add-ons

    In Excel 2007, to enable the analysis package, you need to click go to block Excel Options by pressing the button on the left top corner, and then the button Excel Options"at the bottom of the window:



    To build a regression model, you must select the item Service\Data Analysis\Regression. (In Excel 2007, this mode is in the block Data/Data Analysis/Regression). A dialog box will appear that you need to fill out:

    1) Input interval Y¾ contains a link to cells that contain the values ​​of the resulting characteristic y. The values ​​must be arranged in a column;

    2) Input interval X¾ contains a link to cells that contain factor values. The values ​​must be arranged in columns;

    3) Sign Tags set if the first cells contain explanatory text (data labels);

    4) Reliability level¾ is the confidence level, which is considered to be 95% by default. If this value does not suit you, then you need to enable this flag and enter the required value;

    5) Sign Constant-zero is included if it is necessary to construct an equation in which the free variable is ;

    6) Output Options determine where the results should be placed. By default builds mode New worksheet;

    7) Block Leftovers allows you to include the output of residuals and the construction of their graphs.

    As a result, information is displayed containing all the necessary information and grouped into three blocks: Regression statistics , Analysis of variance, Withdrawal of balance. Let's take a closer look at them.

    1. Regression statistics:

    multiple R is determined by the formula ( Pearson correlation coefficient);

    R (coefficient of determination);

    Normalized R-square is calculated by the formula (used for multiple regression);

    Standard error S calculated by the formula ;

    Observations ¾ is the amount of data n.

    2. Analysis of variance, line Regression:

    Parameter df equals m(number of factor sets x);

    Parameter SS is determined by the formula ;

    Parameter MS is determined by the formula ;

    Statistics F is determined by the formula ;

    Significance F. If the resulting number exceeds , then the hypothesis is accepted (there is no linear relationship), otherwise the hypothesis is accepted (there is a linear relationship).


    3. Analysis of variance, line Remainder:

    Parameter df equal to ;

    Parameter SS is determined by the formula ;

    Parameter MS is determined by the formula.

    4. Analysis of variance, line Total contains the sum of the first two columns.

    5. Analysis of variance, line Y-intersection contains the coefficient, standard error and t-statistics.

    P-value ¾ is the value of the significance levels corresponding to the calculated t-statisticians. Determined by the function STUDIDIST( t-statistics; ). If P-value exceeds , then the corresponding variable is statistically insignificant and can be excluded from the model.

    Bottom 95% And Top 95%¾ are the lower and upper limits of the 95 percent confidence intervals for the coefficients of the theoretical linear regression equation. If the confidence value in the data input block was left at its default value, then the last two columns will duplicate the previous ones. If the user has entered their own confidence value, the last two columns contain the lower and upper bound values ​​for the specified confidence level.

    6. Analysis of variance, the lines contain the coefficient values, standard errors, t-statistician, P-values ​​and confidence intervals for the corresponding .

    7. Block Withdrawal of balance contains the predicted values y(in our notation this is ) and residues .

    The MS Excel package allows you to do most of the work very quickly when constructing a linear regression equation. It is important to understand how to interpret the results obtained. To build a regression model, you must select Tools\Data Analysis\Regression (in Excel 2007 this mode is in the Data/Data Analysis/Regression block). Then copy the results into a block for analysis.

    Initial data:

    Analysis results

    Include in report
    Calculation of regression equation parameters
    Theoretical material
    Regression equation on standard scale
    Multiple Correlation Coefficient (Multiple Correlation Index)
    Partial elasticity coefficients
    Comparative assessment of the influence of the analyzed factors on the resulting characteristic (d - coefficients of separate determination)

    Checking the quality of the constructed regression equation
    Significance of regression coefficients b i (t-statistics. Student's test)
    Significance of the equation as a whole (F-statistics. Fisher's test). Determination coefficient
    Partial F-tests

    Significance level 0.005 0.01 0.025 0.05 0.1 0.25 0.4

    Shows the influence of some values ​​(independent, independent) on the dependent variable. For example, how does the number of economically active population depend on the number of enterprises, wages and other parameters. Or: how do foreign investments, energy prices, etc. affect the level of GDP.

    The result of the analysis allows you to highlight priorities. And based on the main factors, predict, plan the development of priority areas, and make management decisions.

    Regression happens:

    linear (y = a + bx);

    · parabolic (y = a + bx + cx 2);

    · exponential (y = a * exp(bx));

    · power (y = a*x^b);

    · hyperbolic (y = b/x + a);

    logarithmic (y = b * 1n(x) + a);

    · exponential (y = a * b^x).

    Let's look at the construction as an example regression model in Excel and interpretation of the results. Let's take the linear type of regression.

    Task. At 6 enterprises, the average monthly salary and the number of quitting employees were analyzed. It is necessary to determine the dependence of the number of quitting employees on the average salary.

    The linear regression model has the following form:

    Y = a 0 + a 1 x 1 +…+a k x k.

    Where a are regression coefficients, x are influencing variables, k is the number of factors.

    In our example, Y is the indicator of quitting employees. The influencing factor is wages (x).

    Excel has built-in functions that can help you calculate the parameters of a linear regression model. But the “Analysis Package” add-on will do this faster.

    We activate a powerful analytical tool:

    1. Click the “Office” button and go to the “Excel Options” tab. "Add-ons".

    2. At the bottom, under the drop-down list, in the “Manage” field there will be an inscription “Excel Add-ins” (if it is not there, click on the checkbox on the right and select). And the “Go” button. Click.

    3. A list of available add-ons opens. Select “Analysis package” and click OK.

    Once activated, the add-on will be available in the Data tab.

    Now let's do the regression analysis itself.

    1. Open the menu of the “Data Analysis” tool. Select "Regression".



    2. A menu will open to select input values ​​and output options (where to display the result). In the fields for the initial data, we indicate the range of the described parameter (Y) and the factor influencing it (X). The rest may not be filled in.

    3. After clicking OK, the program will display the calculations on a new sheet (you can select an interval to display on the current sheet or assign output to a new workbook).

    First of all, we pay attention to R-squared and coefficients.

    R-squared is the coefficient of determination. In our example – 0.755, or 75.5%. This means that the calculated parameters of the model explain 75.5% of the relationship between the studied parameters. The higher the coefficient of determination, the better the model. Good - above 0.8. Bad – less than 0.5 (such an analysis can hardly be considered reasonable). In our example – “not bad”.

    The coefficient 64.1428 shows what Y will be if all variables in the model under consideration are equal to 0. That is, the value of the analyzed parameter is also influenced by other factors not described in the model.

    The coefficient -0.16285 shows the weight of variable X on Y. That is, the average monthly salary within this model affects the number of quitters with a weight of -0.16285 (this is a small degree of influence). The “-” sign indicates a negative impact: the higher the salary, the fewer people quit. Which is fair.

    It is known for being useful in various fields of activity, including such a discipline as econometrics, where this software utility is used in work. Basically, all actions of practical and laboratory classes are performed in Excel, which greatly facilitates the work, giving detailed explanations certain actions. Thus, one of the analysis tools “Regression” is used to select a graph for a set of observations using the least squares method. Let's look at what this program tool is and what its benefits are for users. Below is also a short but clear instructions building a regression model.

    Main tasks and types of regression

    Regression represents the relationship between given variables, through which a forecast of the future behavior of these variables can be determined. Variables are various periodic phenomena, including human behavior. This analysis Excel programs used to analyze the impact on a specific dependent value variable one or a number of variables. For example, sales in a store are influenced by several factors, including assortment, prices and location of the store. Thanks to regression in Excel, you can determine the degree of influence of each of these factors based on the results of existing sales, and then apply the data obtained to forecast sales for another month or for another store located nearby.

    Typically, regression is presented as a simple equation that reveals the relationships and strengths of relationships between two groups of variables, where one group is dependent or endogenous and the other is independent or exogenous. If there is a group of interrelated indicators, the dependent variable Y is determined based on the logic of reasoning, and the rest act as independent X variables.

    The main tasks of building a regression model are as follows:

    1. Selection of significant independent variables (X1, X2, ..., Xk).
    2. Selecting the type of function.
    3. Constructing estimates for coefficients.
    4. Construction of confidence intervals and regression functions.
    5. Checking the significance of the calculated estimates and the constructed regression equation.

    There are several types of regression analysis:

    • paired (1 dependent and 1 independent variables);
    • multiple (several independent variables).

    There are two types of regression equations:

    1. Linear, illustrating strict linear connection between variables.
    2. Nonlinear - Equations that can include powers, fractions, and trigonometric functions.

    Instructions for building a model

    To perform a given construction in Excel, you must follow the instructions:


    For further calculation, use the “Linear()” function, specifying Y Values, X Values, Const and Statistics. After this, determine the set of points on the regression line using the "Trend" function - Y Values, X Values, New Values, Const. With the help given parameters calculate the unknown value of the coefficients based on the given conditions of the problem.