• How to build a multiple regression graph in excel. Regression in Excel

    The construction of linear regression, evaluation of its parameters and their significance can be performed much faster when using the Excel analysis package (Regression). Let us consider the interpretation of the results obtained in the general case ( k explanatory variables) according to example 3.6.

    In the table regression statistics the following values ​​are given:

    Multiple R – multiple correlation coefficient;

    R- square– coefficient of determination R 2 ;

    Normalized R - square– adjusted R 2 adjusted for the number of degrees of freedom;

    Standard error– regression standard error S;

    Observations – number of observations n.

    In the table Analysis of variance are given:

    1. Column df - number of degrees of freedom equal to

    for string Regression df = k;

    for string Remainderdf = nk – 1;

    for string Totaldf = n– 1.

    2. Column SS – the sum of squared deviations equal to

    for string Regression ;

    for string Remainder ;

    for string Total .

    3. Column MS variances determined by the formula MS = SS/df:

    for string Regression– factor dispersion;

    for string Remainder– residual variance.

    4. Column F – calculated value F-criterion calculated using the formula

    F = MS(regression)/ MS(remainder).

    5. Column Significance F – significance level value corresponding to the calculated F-statistics .

    Significance F= FDIST( F- statistics, df(regression), df(remainder)).

    If significance F < стандартного уровня значимости, то R 2 is statistically significant.

    Odds Standard error t-statistics P-value Bottom 95% Top 95%
    Y 65,92 11,74 5,61 0,00080 38,16 93,68
    X 0,107 0,014 7,32 0,00016 0,0728 0,142

    This table shows:

    1. Odds– coefficient values a, b.

    2. Standard error– standard errors of regression coefficients S a, Sb.



    3. t- statistics– calculated values t -criteria calculated by the formula:

    t-statistic = Coefficients/Standard error.

    4.R-value (significance t) is the significance level value corresponding to the calculated t- statistics.

    R-value = STUDIDIST(t-statistics, df(remainder)).

    If R-meaning< стандартного уровня значимости, то соответствующий коэффициент статистически значим.

    5. Bottom 95% and Top 95%– lower and upper limits of 95% confidence intervals for the coefficients of the theoretical linear regression equation.

    WITHDRAWAL OF THE REST
    Observation Predicted y Residues e
    72,70 -29,70
    82,91 -20,91
    94,53 -4,53
    105,72 5,27
    117,56 12,44
    129,70 19,29
    144,22 20,77
    166,49 24,50
    268,13 -27,13

    In the table WITHDRAWAL OF THE REST indicated:

    in column Observation– observation number;

    in column Foretold y – calculated values ​​of the dependent variable;

    in column Leftovers e – the difference between the observed and calculated values ​​of the dependent variable.

    Example 3.6. There are data (conventional units) on food costs y and per capita income x for nine groups of families:

    x
    y

    Using the results of the Excel analysis package (Regression), we will analyze the dependence of food costs on per capita income.

    The results of regression analysis are usually written in the form:

    where the standard errors of the regression coefficients are indicated in parentheses.

    Regression coefficients A = 65,92 and b= 0.107. Direction of communication between y And x determines the sign of the regression coefficient b= 0.107, i.e. the connection is direct and positive. Coefficient b= 0.107 shows that with an increase in per capita income by 1 conventional. units food costs increase by 0.107 conventional units. units

    Let us evaluate the significance of the coefficients of the resulting model. Significance of coefficients ( a, b) is checked by t-test:

    P-value ( a) = 0,00080 < 0,01 < 0,05

    P-value ( b) = 0,00016 < 0,01 < 0,05,

    therefore, the coefficients ( a, b) are significant at the 1% level, and even more so at the 5% significance level. Thus, the regression coefficients are significant and the model is adequate to the original data.

    The regression estimation results are compatible not only with the obtained values ​​of the regression coefficients, but also with a certain set of them (confidence interval). With a 95% probability, confidence intervals for the coefficients are (38.16 – 93.68) for a and (0.0728 – 0.142) for b.

    The quality of the model is assessed by the coefficient of determination R 2 .

    Magnitude R 2 = 0.884 means that the per capita income factor can explain 88.4% of the variation (scatter) in food expenses.

    Significance R 2 is checked by F- test: significance F = 0,00016 < 0,01 < 0,05, следовательно, R 2 is significant at the 1% level, and even more so at the 5% significance level.

    In the case of pairwise linear regression, the correlation coefficient can be defined as . The obtained value of the correlation coefficient indicates that the relationship between food expenses and per capita income is very close.

    28 Oct

    Good afternoon, dear blog readers! Today we will talk about nonlinear regressions. The solution to linear regressions can be viewed at LINK.

    This method is used mainly in economic modeling and forecasting. Its goal is to observe and identify dependencies between two indicators.

    The main types of nonlinear regressions are:

    • polynomial (quadratic, cubic);
    • hyperbolic;
    • sedate;
    • demonstrative;
    • logarithmic

    Various combinations can also be used. For example, for time series analytics in banking, insurance, and demographic studies, the Gompzer curve is used, which is a type of logarithmic regression.

    In forecasting using nonlinear regressions, the main thing is to find out the correlation coefficient, which will show us whether there is a close relationship between two parameters or not. As a rule, if the correlation coefficient is close to 1, then there is a connection, and the forecast will be quite accurate. Another important element of nonlinear regressions is the average relative error ( A ), if it is in the interval<8…10%, значит модель достаточно точна.

    This is where we will probably finish the theoretical block and move on to practical calculations.

    We have a table of car sales over a period of 15 years (let's denote it X), the number of measurement steps will be the argument n, we also have revenue for these periods (let's denote it Y), we need to predict what the revenue will be in the future. Let's build the following table:

    For the study, we will need to solve the equation (dependence of Y on X): y=ax 2 +bx+c+e. This is a pairwise quadratic regression. In this case, we apply the least squares method to find out the unknown arguments - a, b, c. It will lead to a system of algebraic equations of the form:

    To solve this system, we will use, for example, Cramer’s method. We see that the sums included in the system are coefficients for the unknowns. To calculate them, we will add several columns to the table (D,E,F,G,H) and sign according to the meaning of the calculations - in column D we will square x, in E we will cube it, in F we will multiply the exponents x and y, in H we square x and multiply with y.

    You will get a table of the form filled in with the things needed to solve the equation.

    Let's form a matrix A system consisting of coefficients for unknowns on the left sides of the equations. Let's place it in cell A22 and call it " A=". We follow the system of equations that we chose to solve the regression.

    That is, in cell B21 we must place the sum of the column where we raised the X indicator to the fourth power - F17. Let's just refer to the cell - “=F17”. Next, we need the sum of the column where X was cubed - E17, then we go strictly according to the system. Thus, we will need to fill out the entire matrix.

    In accordance with Cramer's algorithm, we will type a matrix A1, similar to A, in which, instead of the elements of the first column, the elements of the right sides of the system equations should be placed. That is, the sum of the X column squared multiplied by Y, the sum of the XY column and the sum of the Y column.

    We will also need two more matrices - let's call them A2 and A3 in which the second and third columns will consist of the coefficients of the right-hand sides of the equations. The picture will be like this.

    Following the chosen algorithm, we will need to calculate the values ​​of the determinants (determinants, D) of the resulting matrices. Let's use the MOPRED formula. We will place the results in cells J21:K24.

    We will calculate the coefficients of the equation according to Cramer in the cells opposite the corresponding determinants using the formula: a(in cell M22) - “=K22/K21”; b(in cell M23) - “=K23/K21”; With(in cell M24) - “=K24/K21”.

    We get our desired equation of paired quadratic regression:

    y=-0.074x 2 +2.151x+6.523

    Let us evaluate the closeness of the linear relationship using the correlation index.

    To calculate, add an additional column J to the table (let's call it y*). The calculation will be as follows (according to the regression equation we obtained) - “=$m$22*B2*B2+$M$23*B2+$M$24.” Let's place it in cell J2. All that remains is to drag the autofill marker down to cell J16.

    To calculate the sums (Y-Y average) 2, add columns K and L to the table with the corresponding formulas. We calculate the average for the Y column using the AVERAGE function.

    In cell K25 we will place the formula for calculating the correlation index - “=ROOT(1-(K17/L17))”.

    We see that the value of 0.959 is very close to 1, which means there is a close nonlinear relationship between sales and years.

    It remains to evaluate the quality of fit of the resulting quadratic regression equation (determination index). It is calculated using the formula for the squared correlation index. That is, the formula in cell K26 will be very simple - “=K25*K25”.

    The coefficient of 0.920 is close to 1, which indicates a high quality of fit.

    The last step is to calculate the relative error. Let's add a column and enter the formula there: “=ABS((C2-J2)/C2), ABS - module, absolute value. Draw the marker down and in cell M18 display the average value (AVERAGE), assign a percentage format to the cells. The result obtained - 7.79% is within the acceptable error values<8…10%. Значит вычисления достаточно точны.

    If the need arises, we can build a graph using the obtained values.

    An example file is attached - LINK!

    Categories:// from 10/28/2017

    Regression in Excel

    Statistical data processing can also be carried out using the Analysis Package add-on in the “Service” menu sub-item. In Excel 2003, if you open SERVICE, we can’t find the tab DATA ANALYSIS, then click the left mouse button to open the tab SUPERSTRUCTURES and opposite the point ANALYSIS PACKAGE Click the left mouse button to put a check mark (Fig. 17).

    Rice. 17. Window SUPERSTRUCTURES

    After that in the menu SERVICE tab appears DATA ANALYSIS.

    In Excel 2007 to install ANALYSIS PACKAGE you need to click on the OFFICE button in the upper left corner of the sheet (Fig. 18a). Next, click on the button EXCEL SETTINGS. In the window that appears EXCEL SETTINGS left-click on the item SUPERSTRUCTURES and on the right side of the drop-down list select the item ANALYSIS PACKAGE. Next click on OK.


    Excel Options Office button

    Rice. 18. Installation ANALYSIS PACKAGE in Excel 2007

    To install the Analysis Package, click on the button GO, located at the bottom of the open window. A window will appear as shown in Fig. 12. Put a tick in front of ANALYSIS PACKAGE. In the tab DATA a button will appear DATA ANALYSIS(Fig. 19).

    From the suggested items, select the item “ REGRESSION" and click on it with the left mouse button. Next, click OK.

    A window will appear as shown in Fig. 21

    Analysis Tool " REGRESSION» is used to fit a graph to a set of observations using the least squares method. Regression is used to analyze the effect on a single dependent variable of the values ​​of one or more independent variables. For example, several factors influence an athlete's athletic performance, including age, height, and weight. It is possible to calculate the degree to which each of these three factors influences an athlete's performance, and then use that data to predict the performance of another athlete.

    The Regression tool uses the function LINEST.

    REGRESSION Dialog Box

    Labels Select the check box if the first row or first column of the input range contains headings. Clear this check box if there are no headers. In this case, suitable headers for the output table data will be created automatically.

    Reliability Level Select the check box to include an additional level in the output summary table. In the appropriate field, enter the confidence level that you want to apply, in addition to the default 95% level.

    Constant - zero Select the checkbox to force the regression line to pass through the origin.

    Output Range Enter the reference to the top left cell of the output range. Provide at least seven columns for the output summary table, which will include: ANOVA results, coefficients, standard error of the Y calculation, standard deviations, number of observations, standard errors for coefficients.

    New Worksheet Select this option to open a new worksheet in the workbook and paste the analysis results, starting in cell A1. If necessary, enter a name for the new sheet in the field located opposite the corresponding radio button.

    New Workbook Select this option to create a new workbook with the results added to a new worksheet.

    Residuals Select the check box to include residuals in the output table.

    Standardized Residuals Select the check box to include standardized residuals in the output table.

    Residual Plot Select the check box to plot the residuals for each independent variable.

    Fit Plot Select the check box to plot the predicted versus observed values.

    Normal probability plot Select the checkbox to plot a normal probability graph.

    Function LINEST

    To carry out calculations, select with the cursor the cell in which we want to display the average value and press the = key on the keyboard. Next, in the Name field, indicate the desired function, for example AVERAGE(Fig. 22).


    Rice. 22 Finding functions in Excel 2003

    If in the field NAME the name of the function does not appear, then left-click on the triangle next to the field, after which a window with a list of functions will appear. If this function is not in the list, then left-click on the list item OTHER FUNCTIONS, a dialog box will appear FUNCTION MASTER, in which, using vertical scrolling, select the desired function, highlight it with the cursor and click on OK(Fig. 23).

    Rice. 23. Function Wizard

    To search for a function in Excel 2007, any tab can be opened in the menu; then to carry out calculations, select with the cursor the cell in which we want to display the average value and press the = key on the keyboard. Next, in the Name field, specify the function AVERAGE. The window for calculating the function is similar to that shown in Excel 2003.

    You can also select the Formulas tab and left-click on the button in the menu “ INSERT FUNCTION"(Fig. 24), a window will appear FUNCTION MASTER, the appearance of which is similar to Excel 2003. Also in the menu you can immediately select a category of functions (recently used, financial, logical, text, date and time, mathematical, other functions) in which we will search for the desired function.

    Other features Links and Arrays Mathematical

    Rice. 24 Selecting a function in Excel 2007

    Function LINEST calculates statistics for a series using the method of least squares to calculate the straight line that best approximates the available data and then returns an array that describes the resulting straight line. You can also combine the function LINEST with other functions to compute other kinds of models that are linear in unknown parameters (whose unknown parameters are linear), including polynomial, logarithmic, exponential, and power series. Because it returns an array of values, the function must be specified as an array formula.

    The equation for a straight line is:

    (in case of several ranges of x values),

    where the dependent value y is a function of the independent value x, the m values ​​are the coefficients corresponding to each independent variable x, and b is a constant. Note that y, x and m can be vectors. Function LINEST returns an array . LINEST may also return additional regression statistics.

    LINEST(known_values_y; known_values_x; const; statistics)

    Known_y_values ​​- the set of y-values ​​that are already known for the relation.

    If the known_y_values ​​array has one column, then each column in the known_x_values ​​array is treated as a separate variable.

    If the known_y_values ​​array has one row, then each row in the known_x_values ​​array is treated as a separate variable.

    Known_x-values ​​are an optional set of x-values ​​that are already known for the relationship.

    The array known_x_values ​​can contain one or more sets of variables. If only one variable is used, then the known_y_values ​​and known_x_values ​​arrays can have any shape - as long as they have the same dimension. If more than one variable is used, then known_y_values ​​must be a vector (that is, an interval one row high or one column wide).

    If array_known_x_values ​​is omitted, then the array (1;2;3;...) is assumed to be the same size as array_known_values_y.

    Const is a boolean value that specifies whether the constant b is required to be equal to 0.

    If the argument "const" is TRUE or omitted, then the constant b is evaluated as usual.

    If the “const” argument is FALSE, then the value of b is set to 0 and the values ​​of m are selected in such a way that the relation is satisfied.

    Statistics - A boolean value that indicates whether additional regression statistics should be returned.

    If statistics is TRUE, LINEST returns additional regression statistics. The returned array will look like this: (mn;mn-1;...;m1;b:sen;sen-1;...;se1;seb:r2;sey:F;df:ssreg;ssresid).

    If statistics is FALSE or omitted, LINEST returns only the coefficients m and the constant b.

    Additional regression statistics.

    Magnitude Description se1,se2,...,sen Standard error values ​​for coefficients m1,m2,...,mn. seb Standard error value for constant b (seb = #N/A if const is FALSE). r2 Coefficient of determinism. The actual values ​​of y and the values ​​obtained from the equation of the line are compared; Based on the comparison results, the coefficient of determinism is calculated, normalized from 0 to 1. If it is equal to 1, then there is a complete correlation with the model, i.e., there is no difference between the actual and estimated values ​​of y. In the opposite case, if the coefficient of determination is 0, there is no point in using the regression equation to predict the values ​​of y. For more information about how to calculate r2, see the “Notes” at the end of this section. sey Standard error for estimating y. F F-statistic or F-observed value. The F statistic is used to determine whether an observed relationship between a dependent and independent variable is due to chance. df Degrees of freedom. Degrees of freedom are useful for finding F-critical values ​​in a statistical table. To determine the confidence level of the model, you must compare the values ​​in the table with the F statistic returned by the LINEST function. For more information about calculating df, see the “Notes” at the end of this section. Next, Example 4 shows the use of F and df values. ssreg Regression sum of squares. ssresid Residual sum of squares. For more information about calculating ssreg and ssresid, see the “Notes” at the end of this section.

    The figure below shows the order in which additional regression statistics are returned.

    Notes:

    Any straight line can be described by its slope and intersection with the y-axis:

    Slope (m): To determine the slope of a line, usually denoted by m, you take two points on the line and ; the slope will be equal to .

    Y-intercept (b): The y-intercept of a line, usually denoted by b, is the y-value for the point at which the line intersects the y-axis.

    The equation of the straight line has the form . If the values ​​of m and b are known, then any point on the line can be calculated by substituting the values ​​of y or x into the equation. You can also use the TREND function.

    If there is only one independent variable x, you can obtain the slope and y-intercept directly using the following formulas:

    Slope: INDEX(LINEST(known_y_values; known_x_values); 1)

    Y-intercept: INDEX(LINEST(known_y_values; known_x_values); 2)

    The accuracy of the approximation using the straight line calculated by the LINEST function depends on the degree of data scatter. The closer the data is to a straight line, the more accurate the model used by the LINEST function. The LINEST function uses least squares to determine the best fit to the data. When there is only one independent variable x, m and b are calculated using the following formulas:

    where x and y are sample means, for example x = AVERAGE(known_x's) and y = AVERAGE(known_y's).

    The LINEST and LGRFPRIBL fitting functions can calculate the straight line or exponential curve that best fits the data. However, they do not answer the question of which of the two results is more suitable for solving the problem. You can also evaluate the TREND(known_y's; known_x's) function for a straight line or the GROW(known_y's; known_x's) function for an exponential curve. These functions, unless new_x-values ​​are specified, return an array of calculated y-values ​​for the actual x-values ​​along a line or curve. You can then compare the calculated values ​​with the actual values. You can also create charts for visual comparison.

    When performing regression analysis, Microsoft Excel calculates, for each point, the square of the difference between the predicted y value and the actual y value. The sum of these squared differences is called the residual sum of squares (ssresid). Microsoft Excel then calculates the total sum of squares (sstotal). If const = TRUE or the value of this argument is not specified, the total sum of squares will be equal to the sum of the squares of the differences between the actual y values ​​and the average y values. When const = FALSE, the total sum of squares will be equal to the sum of squares of the real y values ​​(without subtracting the average y value from the partial y value). The regression sum of squares can then be calculated as follows: ssreg = sstotal - ssresid. The smaller the residual sum of squares, the greater the value of the coefficient of determination r2, which shows how well the equation obtained using regression analysis explains the relationships between variables. The coefficient r2 is equal to ssreg/sstotal.

    In some cases, one or more X columns (let the Y and X values ​​be in columns) have no additional predicative value in other X columns. In other words, removing one or more X columns may result in Y values ​​calculated with the same precision. In this case, the redundant X columns will be excluded from the regression model. This phenomenon is called "collinearity" because the redundant columns of X can be represented as the sum of several non-redundant columns. The LINEST function checks for collinearity and removes any redundant X columns from the regression model if it detects them. Removed X columns can be identified in LINEST output by a factor of 0 and a se value of 0. Removing one or more columns as redundant changes the value of df because it depends on the number of X columns actually used for predictive purposes. For more information on calculating df, see Example 4 below. When df changes due to the removal of redundant columns, the values ​​of sey and F also change. It is not recommended to use collinearity often. However, it should be used if some X columns contain 0 or 1 as an indicator indicating whether the subject of the experiment belongs to a separate group. If const = TRUE or a value for this argument is not specified, LINEST inserts an additional X column to model the intersection point. If there is a column with values ​​of 1 for men and 0 for women, and there is a column with values ​​of 1 for women and 0 for men, then the last column is removed because its values ​​can be obtained from the "male indicator" column.

    The calculation of df for cases where X columns are not removed from the model due to collinearity occurs as follows: if there are k known_x columns and the value const = TRUE or not specified, then df = n – k – 1. If const = FALSE, then df = n - k. In both cases, removing the X columns due to collinearity increases the df value by 1.

    Formulas that return arrays must be entered as array formulas.

    When entering an array of constants as an argument, for example, known_x_values, you should use a semicolon to separate values ​​on the same line and a colon to separate lines. The separator characters may vary depending on the settings in the Language and Settings window in Control Panel.

    It should be noted that the y values ​​predicted by the regression equation may not be correct if they fall outside the range of the y values ​​that were used to define the equation.

    Basic algorithm used in the function LINEST, differs from the main function algorithm INCLINE And CUT. The difference between algorithms can lead to different results with uncertain and collinear data. For example, if the known_y_values ​​argument data points are 0 and the known_x_values ​​argument data points are 1, then:

    Function LINEST returns a value equal to 0. Function algorithm LINEST is used to return suitable values ​​for collinear data, in which case at least one answer can be found.

    The SLOPE and LINE functions return the #DIV/0! error. The algorithm of the SLOPE and INTERCEPT functions is used to find only one answer, but in this case there may be several.

    In addition to calculating statistics for other types of regression, LINEST can be used to calculate ranges for other types of regression by entering functions of the x and y variables as series of the x and y variables for LINEST. For example, the following formula:

    LINEST(y_values, x_values^COLUMN($A:$C))

    works by having one column of Y values ​​and one column of X values ​​to calculate a cube approximation (3rd degree polynomial) of the following form:

    The formula can be modified to calculate other types of regression, but in some cases adjustments to the output values ​​and other statistics are required.

    Statistical data processing can also be carried out using an add-on ANALYSIS PACKAGE(Fig. 62).

    From the suggested items, select the item “ REGRESSION" and click on it with the left mouse button. Next, click OK.

    A window will appear as shown in Fig. 63.

    Analysis Tool " REGRESSION» is used to fit a graph to a set of observations using the least squares method. Regression is used to analyze the effect on a single dependent variable of the values ​​of one or more independent variables. For example, several factors influence an athlete's athletic performance, including age, height, and weight. It is possible to calculate the degree to which each of these three factors influences an athlete's performance, and then use that data to predict the performance of another athlete.

    The Regression tool uses the function LINEST.

    REGRESSION Dialog Box

    Labels Select the check box if the first row or first column of the input range contains headings. Clear this check box if there are no headers. In this case, suitable headers for the output table data will be created automatically.

    Reliability Level Select the check box to include an additional level in the output summary table. In the appropriate field, enter the confidence level that you want to apply, in addition to the default 95% level.

    Constant - zero Select the checkbox to force the regression line to pass through the origin.

    Output Range Enter the reference to the top left cell of the output range. Provide at least seven columns for the output summary table, which will include: ANOVA results, coefficients, standard error of the Y calculation, standard deviations, number of observations, standard errors for coefficients.

    New Worksheet Select this option to open a new worksheet in the workbook and paste the analysis results, starting in cell A1. If necessary, enter a name for the new sheet in the field located opposite the corresponding radio button.

    New Workbook Select this option to create a new workbook with the results added to a new worksheet.

    Residuals Select the check box to include residuals in the output table.

    Standardized Residuals Select the check box to include standardized residuals in the output table.

    Residual Plot Select the check box to plot the residuals for each independent variable.

    Fit Plot Select the check box to plot the predicted versus observed values.

    Normal probability plot Select the checkbox to plot a normal probability graph.

    Function LINEST

    To carry out calculations, select with the cursor the cell in which we want to display the average value and press the = key on the keyboard. Next, in the Name field, indicate the desired function, for example AVERAGE(Fig. 22).

    Function LINEST calculates statistics for a series using the method of least squares to calculate the straight line that best approximates the available data and then returns an array that describes the resulting straight line. You can also combine the function LINEST with other functions to compute other kinds of models that are linear in unknown parameters (whose unknown parameters are linear), including polynomial, logarithmic, exponential, and power series. Because it returns an array of values, the function must be specified as an array formula.

    The equation for a straight line is:

    y=m 1 x 1 +m 2 x 2 +…+b (in case of several ranges of x values),

    where the dependent value y is a function of the independent value x, the m values ​​are the coefficients corresponding to each independent variable x, and b is a constant. Note that y, x and m can be vectors. Function LINEST returns array(mn;mn-1;…;m 1 ;b). LINEST may also return additional regression statistics.

    LINEST(known_values_y; known_values_x; const; statistics)

    Known_y_values ​​- a set of y-values ​​that are already known for the relation y=mx+b.

    If the known_y_values ​​array has one column, then each column in the known_x_values ​​array is treated as a separate variable.

    If the known_y_values ​​array has one row, then each row in the known_x_values ​​array is treated as a separate variable.

    Known_x-values ​​are an optional set of x-values ​​that are already known for the relationship y=mx+b.

    The array known_x_values ​​can contain one or more sets of variables. If only one variable is used, then the known_y_values ​​and known_x_values ​​arrays can have any shape - as long as they have the same dimension. If more than one variable is used, then known_y_values ​​must be a vector (that is, an interval one row high or one column wide).

    If array_known_x_values ​​is omitted, then the array (1;2;3;...) is assumed to be the same size as array_known_values_y.

    Const is a boolean value that specifies whether the constant b is required to be equal to 0.

    If the argument "const" is TRUE or omitted, then the constant b is evaluated as usual.

    If the “const” argument is FALSE, then the value of b is set to 0 and the values ​​of m are selected in such a way that the relation y=mx is satisfied.

    Statistics - A boolean value that indicates whether additional regression statistics should be returned.

    If statistics is TRUE, LINEST returns additional regression statistics. The returned array will look like this: (mn;mn-1;...;m1;b:sen;sen-1;...;se1;seb:r2;sey:F;df:ssreg;ssresid).

    If statistics is FALSE or omitted, LINEST returns only the coefficients m and the constant b.

    Additional regression statistics (Table 17)

    Magnitude Description
    se1,se2,...,sen Standard error values ​​for coefficients m1,m2,...,mn.
    seb Standard error value for constant b (seb = #N/A if const is FALSE).
    r2 Coefficient of determinism. The actual values ​​of y and the values ​​obtained from the equation of the line are compared; Based on the comparison results, the coefficient of determinism is calculated, normalized from 0 to 1. If it is equal to 1, then there is a complete correlation with the model, i.e., there is no difference between the actual and estimated values ​​of y. In the opposite case, if the coefficient of determination is 0, there is no point in using the regression equation to predict the values ​​of y. For more information about how to calculate r2, see the “Notes” at the end of this section.
    sey Standard error for estimating y.
    F F-statistic or F-observed value. The F statistic is used to determine whether an observed relationship between a dependent and independent variable is due to chance.
    df Degrees of freedom. Degrees of freedom are useful for finding F-critical values ​​in a statistical table. To determine the confidence level of the model, you must compare the values ​​in the table with the F statistic returned by the LINEST function. For more information about calculating df, see the “Notes” at the end of this section. Next, Example 4 shows the use of F and df values.
    ssreg Regression sum of squares.
    ssresid Residual sum of squares. For more information about calculating ssreg and ssresid, see the “Notes” at the end of this section.

    The figure below shows the order in which additional regression statistics are returned (Figure 64).

    Notes:

    Any straight line can be described by its slope and intersection with the y-axis:

    Slope (m): To determine the slope of a line, usually denoted by m, you need to take two points on the line (x 1 ,y 1) and (x 2 ,y 2); the slope will be equal to (y 2 -y 1)/(x 2 -x 1).

    Y-intercept (b): The y-intercept of a line, usually denoted by b, is the y-value for the point at which the line intersects the y-axis.

    The equation of the straight line is y=mx+b. If the values ​​of m and b are known, then any point on the line can be calculated by substituting the values ​​of y or x into the equation. You can also use the TREND function.

    If there is only one independent variable x, you can obtain the slope and y-intercept directly using the following formulas:

    Slope: INDEX(LINEST(known_y_values; known_x_values); 1)

    Y-intercept: INDEX(LINEST(known_y_values; known_x_values); 2)

    The accuracy of the approximation using the straight line calculated by the LINEST function depends on the degree of data scatter. The closer the data is to a straight line, the more accurate the model used by the LINEST function. The LINEST function uses least squares to determine the best fit to the data. When there is only one independent variable x, m and b are calculated using the following formulas:

    where x and y are sample means, for example x = AVERAGE(known_x's) and y = AVERAGE(known_y's).

    The LINEST and LGRFPRIBL fitting functions can calculate the straight line or exponential curve that best fits the data. However, they do not answer the question of which of the two results is more suitable for solving the problem. You can also evaluate the TREND(known_y's; known_x's) function for a straight line or the GROW(known_y's; known_x's) function for an exponential curve. These functions, unless new_x-values ​​are specified, return an array of calculated y-values ​​for the actual x-values ​​along a line or curve. You can then compare the calculated values ​​with the actual values. You can also create charts for visual comparison.

    When performing regression analysis, Microsoft Excel calculates, for each point, the square of the difference between the predicted y value and the actual y value. The sum of these squared differences is called the residual sum of squares (ssresid). Microsoft Excel then calculates the total sum of squares (sstotal). If const = TRUE or the value of this argument is not specified, the total sum of squares will be equal to the sum of the squares of the differences between the actual y values ​​and the average y values. When const = FALSE, the total sum of squares will be equal to the sum of squares of the real y values ​​(without subtracting the average y value from the partial y value). The regression sum of squares can then be calculated as follows: ssreg = sstotal - ssresid. The smaller the residual sum of squares, the greater the value of the coefficient of determination r2, which shows how well the equation obtained using regression analysis explains the relationships between variables. The coefficient r2 is equal to ssreg/sstotal.

    In some cases, one or more X columns (let the Y and X values ​​be in columns) have no additional predicative value in other X columns. In other words, removing one or more X columns may result in Y values ​​calculated with the same precision. In this case, the redundant X columns will be excluded from the regression model. This phenomenon is called "collinearity" because the redundant columns of X can be represented as the sum of several non-redundant columns. The LINEST function checks for collinearity and removes any redundant X columns from the regression model if it detects them. Removed X columns can be identified in LINEST output by a factor of 0 and a se value of 0. Removing one or more columns as redundant changes the value of df because it depends on the number of X columns actually used for predictive purposes. For more information on calculating df, see Example 4 below. When df changes due to the removal of redundant columns, the values ​​of sey and F also change. It is not recommended to use collinearity often. However, it should be used if some X columns contain 0 or 1 as an indicator indicating whether the subject of the experiment belongs to a separate group. If const = TRUE or a value for this argument is not specified, LINEST inserts an additional X column to model the intersection point. If there is a column with values ​​of 1 for men and 0 for women, and there is a column with values ​​of 1 for women and 0 for men, then the last column is removed because its values ​​can be obtained from the "male indicator" column.

    The calculation of df for cases where X columns are not removed from the model due to collinearity occurs as follows: if there are k known_x columns and the value const = TRUE or not specified, then df = n – k – 1. If const = FALSE, then df = n - k. In both cases, removing the X columns due to collinearity increases the df value by 1.

    Formulas that return arrays must be entered as array formulas.

    When entering an array of constants as an argument, for example, known_x_values, you should use a semicolon to separate values ​​on the same line and a colon to separate lines. The separator characters may vary depending on the settings in the Language and Settings window in Control Panel.

    It should be noted that the y values ​​predicted by the regression equation may not be correct if they fall outside the range of the y values ​​that were used to define the equation.

    Basic algorithm used in the function LINEST, differs from the main function algorithm INCLINE And CUT. The difference between algorithms can lead to different results with uncertain and collinear data. For example, if the known_y_values ​​argument data points are 0 and the known_x_values ​​argument data points are 1, then:

    Function LINEST returns a value equal to 0. Function algorithm LINEST is used to return suitable values ​​for collinear data, in which case at least one answer can be found.

    The SLOPE and LINE functions return the #DIV/0! error. The algorithm of the SLOPE and INTERCEPT functions is used to find only one answer, but in this case there may be several.

    In addition to calculating statistics for other types of regression, LINEST can be used to calculate ranges for other types of regression by entering functions of the x and y variables as series of the x and y variables for LINEST. For example, the following formula:

    LINEST(y_values, x_values^COLUMN($A:$C))

    works by having one column of Y values ​​and one column of X values ​​to calculate a cube approximation (3rd degree polynomial) of the following form:

    y=m 1 x+m 2 x 2 +m 3 x 3 +b

    The formula can be modified to calculate other types of regression, but in some cases adjustments to the output values ​​and other statistics are required.

    Shows the influence of some values ​​(independent, independent) on the dependent variable. For example, how does the number of economically active population depend on the number of enterprises, wages and other parameters. Or: how do foreign investments, energy prices, etc. affect the level of GDP.

    The result of the analysis allows you to highlight priorities. And based on the main factors, predict, plan the development of priority areas, and make management decisions.

    Regression happens:

    linear (y = a + bx);

    · parabolic (y = a + bx + cx 2);

    · exponential (y = a * exp(bx));

    · power (y = a*x^b);

    · hyperbolic (y = b/x + a);

    logarithmic (y = b * 1n(x) + a);

    · exponential (y = a * b^x).

    Let's look at an example of building a regression model in Excel and interpreting the results. Let's take the linear type of regression.

    Task. At 6 enterprises, the average monthly salary and the number of quitting employees were analyzed. It is necessary to determine the dependence of the number of quitting employees on the average salary.

    The linear regression model looks like this:

    Y = a 0 + a 1 x 1 +…+a k x k.

    Where a are regression coefficients, x are influencing variables, k is the number of factors.

    In our example, Y is the indicator of quitting employees. The influencing factor is wages (x).

    Excel has built-in functions that can help you calculate the parameters of a linear regression model. But the “Analysis Package” add-on will do this faster.

    We activate a powerful analytical tool:

    1. Click the “Office” button and go to the “Excel Options” tab. "Add-ons".

    2. At the bottom, under the drop-down list, in the “Manage” field there will be an inscription “Excel Add-ins” (if it is not there, click on the checkbox on the right and select). And the “Go” button. Click.

    3. A list of available add-ons opens. Select “Analysis package” and click OK.

    Once activated, the add-on will be available in the Data tab.

    Now let's do the regression analysis itself.

    1. Open the menu of the “Data Analysis” tool. Select "Regression".



    2. A menu will open to select input values ​​and output options (where to display the result). In the fields for the initial data, we indicate the range of the described parameter (Y) and the factor influencing it (X). The rest need not be filled out.

    3. After clicking OK, the program will display the calculations on a new sheet (you can select an interval to display on the current sheet or assign output to a new workbook).

    First of all, we pay attention to R-squared and coefficients.

    R-squared is the coefficient of determination. In our example – 0.755, or 75.5%. This means that the calculated parameters of the model explain 75.5% of the relationship between the studied parameters. The higher the coefficient of determination, the better the model. Good - above 0.8. Bad – less than 0.5 (such an analysis can hardly be considered reasonable). In our example – “not bad”.

    The coefficient 64.1428 shows what Y will be if all variables in the model under consideration are equal to 0. That is, the value of the analyzed parameter is also influenced by other factors not described in the model.

    The coefficient -0.16285 shows the weight of variable X on Y. That is, the average monthly salary within this model affects the number of quitters with a weight of -0.16285 (this is a small degree of influence). The “-” sign indicates a negative impact: the higher the salary, the fewer people quit. Which is fair.