• How to build a regression equation in Excel. Mathematical methods in psychology

    IN Excel there is an even faster and convenient way build a graph linear regression(and even the main types of nonlinear regressions, as discussed below). This can be done as follows:

    1) select columns with data X And Y(they should be in that order!);

    2) call Chart Wizard and select in the group TypeSpot and immediately press Ready;

    3) without deselecting the diagram, select the main menu item that appears Diagram, in which you should select the item Add a trend line;

    4) in the dialog box that appears Trend line in the tab Type choose Linear;

    5) in the tab Options you can activate the switch Show equation in diagram, which will allow you to see the linear regression equation (4.4), in which the coefficients (4.5) will be calculated.

    6) In the same tab you can activate the switch Place the approximation reliability value (R^2) on the diagram. This value is the square of the correlation coefficient (4.3) and it shows how well the calculated equation describes the experimental dependence. If R 2 is close to unity, then the theoretical regression equation describes the experimental dependence well (the theory agrees well with experiment), and if R 2 is close to zero, then given equation not suitable for describing the experimental dependence (theory does not agree with experiment).

    As a result of performing the described actions, you will get a diagram with a regression graph and its equation.

    §4.3. Main types nonlinear regression

    Parabolic and polynomial regression.

    Parabolic dependence of the value Y from the size X is called a dependence expressed by a quadratic function (2nd order parabola):

    This equation is called parabolic regression equation Y on X. Options A, b, With are called parabolic regression coefficients. Calculating parabolic regression coefficients is always cumbersome, so it is recommended to use a computer for calculations.

    Equation (4.8) of parabolic regression is a special case of a more general regression called polynomial. Polynomial dependence of the value Y from the size X is called a dependence expressed by a polynomial n-th order:

    where are the numbers and i (i=0,1,…, n) are called polynomial regression coefficients.

    Power regression.

    Power dependence of the value Y from the size X is called a dependency of the form:

    This equation is called power regression equation Y on X. Options A And b are called power regression coefficients.

    ln =ln a+b ln x. (4.11)

    This equation describes a straight line on a plane with logarithmic coordinate axes ln x and ln. Therefore, the criterion for the applicability of power regression is the requirement that the points of logarithms of empirical data ln x i and ln y i were closest to the straight line (4.11).

    Exponential regression.

    Indicative(or exponential) dependence of the value Y from the size X is called a dependency of the form:

    (or ). (4.12)

    This equation is called exponential equation(or exponential) regression Y on X. Options A(or k) And b are called exponential coefficients(or exponential) regression.

    If we take the logarithm of both sides of the power regression equation, we get the equation

    ln = x ln a+ln b(or ln = k x+ln b). (4.13)

    This equation describes linear dependence logarithm of one quantity ln from another quantity x. Therefore, the criterion for the applicability of power regression is the requirement that empirical data points of the same value x i and logarithms of another quantity ln y i were closest to the straight line (4.13).

    Logarithmic regression.

    Logarithmic dependence of the value Y from the size X is called a dependency of the form:

    =a+b ln x. (4.14)

    This equation is called logarithmic regression equation Y on X. Options A And b are called logarithmic regression coefficients.

    Hyperbolic regression.

    Hyperbolic dependence of the value Y from the size X is called a dependency of the form:

    This equation is called hyperbolic regression equation Y on X. Options A And b are called hyperbolic regression coefficients and are determined by the least squares method. Application of this method leads to the formulas:

    In formulas (4.16-4.17), the summation is carried out over the index i from one to the number of observations n.

    Unfortunately, in Excel there are no functions that calculate hyperbolic regression coefficients. In cases where it is not known that the measured quantities are related by inverse proportionality, it is recommended to look for a power regression equation instead of the hyperbolic regression equation, so in Excel there is a procedure for finding it. If a hyperbolic dependence is assumed between the measured quantities, then its regression coefficients will have to be calculated using auxiliary calculation tables and summation operations using formulas (4.16-4.17).

    The construction of linear regression, evaluation of its parameters and their significance can be performed much faster when using the Excel analysis package (Regression). Let us consider the interpretation of the results obtained in the general case ( k explanatory variables) according to example 3.6.

    In the table regression statistics the following values ​​are given:

    Multiple R – multiple correlation coefficient;

    R- square– coefficient of determination R 2 ;

    Normalized R - square– adjusted R 2 adjusted for the number of degrees of freedom;

    Standard error– regression standard error S;

    Observations – number of observations n.

    In the table Analysis of variance are given:

    1. Column df - number of degrees of freedom equal to

    for string Regression df = k;

    for string Remainderdf = nk – 1;

    for string Totaldf = n– 1.

    2. Column SS – the sum of squared deviations equal to

    for string Regression ;

    for string Remainder ;

    for string Total .

    3. Column MS variances determined by the formula MS = SS/df:

    for string Regression– factor dispersion;

    for string Remainder– residual variance.

    4. Column F – calculated value F-criterion calculated by the formula

    F = MS(regression)/ MS(remainder).

    5. Column Significance F – significance level value corresponding to the calculated F-statistics .

    Significance F= FDIST( F- statistics, df(regression), df(remainder)).

    If significance F < стандартного уровня значимости, то R 2 is statistically significant.

    Odds Standard error t-statistics P-value Bottom 95% Top 95%
    Y 65,92 11,74 5,61 0,00080 38,16 93,68
    X 0,107 0,014 7,32 0,00016 0,0728 0,142

    This table shows:

    1. Odds– coefficient values a, b.

    2. Standard error– standard errors of regression coefficients S a, S b.



    3. t- statistics– calculated values t -criteria calculated by the formula:

    t-statistic = Coefficients/Standard error.

    4.R-value (significance t) is the significance level value corresponding to the calculated t- statistics.

    R-value = STUDIDIST(t-statistics, df(remainder)).

    If R-meaning< стандартного уровня значимости, то соответствующий коэффициент статистически значим.

    5. Bottom 95% and Top 95%– lower and upper limits of 95% confidence intervals for the coefficients of the theoretical linear regression equation.

    WITHDRAWAL OF THE REST
    Observation Predicted y Residues e
    72,70 -29,70
    82,91 -20,91
    94,53 -4,53
    105,72 5,27
    117,56 12,44
    129,70 19,29
    144,22 20,77
    166,49 24,50
    268,13 -27,13

    In the table WITHDRAWAL OF THE REST indicated:

    in column Observation– observation number;

    in column Foretold y – calculated values ​​of the dependent variable;

    in column Leftovers e – the difference between the observed and calculated values ​​of the dependent variable.

    Example 3.6. There are data (conventional units) on food costs y and per capita income x for nine groups of families:

    x
    y

    Using the results of the Excel analysis package (Regression), we will analyze the dependence of food costs on per capita income.

    Results regression analysis It is customary to write it in the form:

    where the standard errors of the regression coefficients are indicated in parentheses.

    Regression coefficients A = 65,92 and b= 0.107. Direction of communication between y And x determines the sign of the regression coefficient b= 0.107, i.e. the connection is direct and positive. Coefficient b= 0.107 shows that with an increase in per capita income by 1 conventional. units food costs increase by 0.107 conventional units. units

    Let us evaluate the significance of the coefficients of the resulting model. Significance of coefficients ( a, b) is checked by t-test:

    P-value ( a) = 0,00080 < 0,01 < 0,05

    P-value ( b) = 0,00016 < 0,01 < 0,05,

    therefore, the coefficients ( a, b) are significant at the 1% level, and even more so at the 5% significance level. Thus, the regression coefficients are significant and the model is adequate to the original data.

    The regression estimation results are compatible not only with the obtained values ​​of the regression coefficients, but also with a certain set of them (confidence interval). With a 95% probability, confidence intervals for the coefficients are (38.16 – 93.68) for a and (0.0728 – 0.142) for b.

    The quality of the model is assessed by the coefficient of determination R 2 .

    Magnitude R 2 = 0.884 means that the per capita income factor can explain 88.4% of the variation (scatter) in food expenses.

    Significance R 2 is checked by F- test: significance F = 0,00016 < 0,01 < 0,05, следовательно, R 2 is significant at the 1% level, and even more so at the 5% significance level.

    In the case of pairwise linear regression, the correlation coefficient can be defined as . The obtained value of the correlation coefficient indicates that the relationship between food costs and per capita income is very close.

    CORRELATION AND REGRESSION ANALYSIS INMS EXCEL

    1. Create a source data file in MS Excel (for example, table 2)

    2. Construction of the correlation field

    To construct a correlation field in command line select menu Insert/Diagram. In the dialog box that appears, select the chart type: Spot; view: Scatter plot, allowing you to compare pairs of values ​​(Fig. 22).

    Figure 22 – Selecting a chart type


    Figure 23– Window view when selecting a range and rows
    Figure 25 – Window view, step 4

    2. B context menu select a team Add a trend line.

    3. In the dialog box that appears, select the graph type (linear in our example) and the equation parameters, as shown in Figure 26.


    Click OK. The result is presented in Figure 27.

    Figure 27 – Correlation field of the dependence of labor productivity on capital-labor ratio

    Similarly, we construct a correlation field for the dependence of labor productivity on the equipment shift ratio. (Figure 28).


    Figure 28 – Correlation field of labor productivity

    on the equipment replacement rate

    3. Construction of the correlation matrix.

    To build a correlation matrix in the menu Service choose Data analysis.

    Using a data analysis tool Regression, in addition to the results of regression statistics, analysis of variance and confidence intervals, you can obtain residuals and graphs of fitting the regression line, residuals and normal probability. To do this, you need to check access to the analysis package. In the main menu, select Service/Add-ons. Check the box Analysis package(Figure 29)


    Figure 30 – Dialog box Data Analysis

    After clicking OK, in the dialog box that appears, specify the input interval (in our example A2:D26), grouping (in our case by columns) and output parameters, as shown in Figure 31.


    Figure 31 – Dialog box Correlation

    The calculation results are presented in Table 4.

    Table 4 – Correlation matrix

    Column 1

    Column 2

    Column 3

    Column 1

    Column 2

    Column 3

    SINGLE-FACTOR REGRESSION ANALYSIS

    USING THE REGRESSION TOOL

    To conduct a regression analysis of the dependence of labor productivity on the capital-labor ratio in the menu Service choose Data Analysis and specify the analysis tool Regression(Figure 32).


    Figure 33 – Dialog box Regression

    Regression analysis is one of the most popular methods of statistical research. It can be used to establish the degree of influence of independent variables on the dependent variable. In functionality Microsoft Excel There are tools designed to perform this type of analysis. Let's look at what they are and how to use them.

    Connecting the analysis package

    But, in order to use the function that allows you to perform regression analysis, you first need to activate the Analysis Package. Only then the tools necessary for this procedure will appear on the Excel ribbon.

    1. Move to the “File” tab.
    2. Go to the “Settings” section.
    3. The Excel Options window opens. Go to the “Add-ons” subsection.
    4. At the very bottom of the window that opens, move the switch in the “Management” block to the “Excel Add-ins” position, if it is in a different position. Click on the “Go” button.
    5. A window of available Excel add-ins opens. Check the box next to “Analysis Package”. Click on the “OK” button.

    Now, when we go to the “Data” tab, on the ribbon in the “Analysis” tool block we will see a new button - “Data Analysis”.

    Types of Regression Analysis

    There are several types of regressions:

    • parabolic;
    • sedate;
    • logarithmic;
    • exponential;
    • demonstrative;
    • hyperbolic;
    • linear regression.

    We will talk in more detail about performing the last type of regression analysis in Excel later.

    Linear Regression in Excel

    Below, as an example, is a table showing the average daily air temperature outside and the number of store customers for the corresponding working day. Let's find out using regression analysis exactly how weather conditions in the form of air temperature can affect the attendance of a retail establishment.

    The general linear regression equation is as follows: Y = a0 + a1x1 +...+akhk. In this formula, Y means the variable on which we are trying to study the influence of factors. In our case, this is the number of buyers. The value of x is the various factors that influence the variable. The parameters a are the regression coefficients. That is, they are the ones who determine the significance of a particular factor. The index k denotes the total number of these same factors.


    Analysis results analysis

    The results of the regression analysis are displayed in the form of a table in the place specified in the settings.

    One of the main indicators is R-squared. It indicates the quality of the model. In our case, this coefficient is 0.705 or about 70.5%. This is an acceptable level of quality. Dependency less than 0.5 is bad.

    Another important indicator located in the cell at the intersection of the Y-intercept row and the Coefficients column. This indicates what value Y will have, and in our case, this is the number of buyers, with all other factors equal to zero. In this table given value equals 58.04.

    The value at the intersection of the columns “Variable X1” and “Coefficients” shows the level of dependence of Y on X. In our case, this is the level of dependence of the number of store customers on temperature. A coefficient of 1.31 is considered a fairly high influence indicator.

    As we can see, using Microsoft programs Excel is quite easy to create a regression analysis table. But only a trained person can work with the output data and understand its essence.

    We are glad that we were able to help you solve the problem.

    Ask your question in the comments, describing the essence of the problem in detail. Our specialists will try to answer as quickly as possible.

    Did this article help you?

    The linear regression method allows us to describe a straight line that best fits a series of ordered pairs (x, y). The equation for a straight line, known as the linear equation, is given below:

    ŷ - expected value of y at set value X,

    x - independent variable,

    a - segment on the y-axis for a straight line,

    b is the slope of the straight line.

    The figure below illustrates this concept graphically:

    The figure above shows the line described by the equation ŷ =2+0.5x. The y-intercept is the point at which the line intersects the y-axis; in our case, a = 2. The slope of the line, b, the ratio of the rise of the line to the length of the line, has a value of 0.5. A positive slope means the line rises from left to right. If b = 0, the line is horizontal, which means there is no relationship between the dependent and independent variables. In other words, changing the value of x does not affect the value of y.

    ŷ and y are often confused. The graph shows 6 ordered pairs of points and a line, according to the given equation

    This figure shows the point corresponding to the ordered pair x = 2 and y = 4. Note that the expected value of y according to the line at X= 2 is ŷ. We can confirm this with the following equation:

    ŷ = 2 + 0.5х =2 +0.5(2) =3.

    The y value represents the actual point and the ŷ value is the expected y value using linear equation for a given value of x.

    The next step is to determine the linear equation that best matches the set of ordered pairs, we talked about this in the previous article, where we determined the form of the equation using the least squares method.

    Using Excel to Define Linear Regression

    In order to use the regression analysis tool built into Excel, you must activate the add-in Analysis package. You can find it by clicking on the tab File -> Options(2007+), in the dialog box that appears OptionsExcel go to the tab Add-ons. In the field Control choose Add-onsExcel and click Go. In the window that appears, check the box next to Analysis package, click OK.

    In the tab Data in the group Analysis will appear new button Data analysis.

    To demonstrate how the add-in works, let's use data from a previous article, where a guy and a girl share a table in the bathroom. Enter the data from our bathroom example in Columns A and B of the blank sheet.

    Go to the tab Data, in the group Analysis click Data analysis. In the window that appears Data Analysis select Regression as shown in the figure and click OK.

    Set the necessary regression parameters in the window Regression as shown in the picture:

    Click OK. The figure below shows the results obtained:

    These results are consistent with those we obtained by doing our own calculations in the previous article.

    Regression analysis is statistical method research that allows you to show the dependence of a particular parameter on one or more independent variables. In the pre-computer era, its use was quite difficult, especially when it came to large volumes of data. Today, having learned how to build regression in Excel, you can solve complex statistical problems in just a couple of minutes. Below are specific examples from the field of economics.

    Types of Regression

    This concept itself was introduced into mathematics by Francis Galton in 1886. Regression happens:

    • linear;
    • parabolic;
    • sedate;
    • exponential;
    • hyperbolic;
    • demonstrative;
    • logarithmic.

    Example 1

    Let's consider the problem of determining the dependence of the number of team members who quit on the average salary at 6 industrial enterprises.

    Task. At six enterprises, the average monthly salary and the number of employees who quit due to at will. IN tabular form we have:

    For the task of determining the dependence of the number of quitting workers on the average salary at 6 enterprises, the regression model has the form of the equation Y = a0 + a1×1 +…+аkxk, where хi are the influencing variables, ai are the regression coefficients, and k is the number of factors.

    For this task, Y is the indicator of employees who quit, and the influencing factor is salary, which we denote by X.

    Using the capabilities of the Excel spreadsheet processor

    Regression analysis in Excel must be preceded by applying built-in functions to existing tabular data. However, for these purposes it is better to use the very useful “Analysis Pack” add-on. To activate it you need:

    • from the “File” tab go to the “Options” section;
    • in the window that opens, select the line “Add-ons”;
    • click on the “Go” button located below, to the right of the “Management” line;
    • check the box next to the name “Analysis package” and confirm your actions by clicking “Ok”.

    If everything is done correctly, the required button will appear on the right side of the “Data” tab, located above the Excel worksheet.

    Linear Regression in Excel

    Now that you have everything you need at hand virtual instruments to carry out econometric calculations, we can begin to solve our problem. To do this:

    • click on the “Data Analysis” button;
    • in the window that opens, click on the “Regression” button;
    • in the tab that appears, enter the range of values ​​for Y (the number of quitting employees) and for X (their salaries);
    • We confirm our actions by pressing the “Ok” button.

    As a result, the program will automatically fill in a new sheet table processor regression analysis data. Pay attention! Excel allows you to manually set the location you prefer for this purpose. For example, it could be the same sheet where the Y and X values ​​are located, or even new book, specifically designed for storing such data.

    Analysis of regression results for R-squared

    IN Excel data obtained during processing of the data of the example under consideration have the form:

    First of all, you should pay attention to the R-squared value. It represents the coefficient of determination. IN in this example R-squared = 0.755 (75.5%), i.e., the calculated parameters of the model explain the relationship between the considered parameters by 75.5%. The higher the value of the coefficient of determination, the more suitable the selected model is for a specific task. It is considered to correctly describe the real situation when the R-square value is above 0.8. If R-squared is tcr, then the hypothesis about the insignificance of the free term of the linear equation is rejected.

    In the problem under consideration for the free term, using Excel tools, it was obtained that t = 169.20903, and p = 2.89E-12, i.e., we have zero probability that the correct hypothesis about the insignificance of the free term will be rejected. For the coefficient for the unknown t=5.79405, and p=0.001158. In other words, the probability that the correct hypothesis about the insignificance of the coefficient for an unknown will be rejected is 0.12%.

    Thus, it can be argued that the resulting linear regression equation is adequate.

    The problem of the feasibility of purchasing a block of shares

    Multiple regression in Excel is performed using the same Data Analysis tool. Let's consider a specific application problem.

    The management of the NNN company must decide on the advisability of purchasing a 20% stake in MMM JSC. The cost of the package (SP) is 70 million US dollars. NNN specialists have collected data on similar transactions. It was decided to evaluate the value of the shareholding according to such parameters, expressed in millions of US dollars, as:

    • accounts payable (VK);
    • annual turnover volume (VO);
    • accounts receivable (VD);
    • cost of fixed assets (COF).

    In addition, the parameter of the enterprise's wage arrears (V3 P) in thousands of US dollars is used.

    Solution using Excel spreadsheet processor

    First of all, you need to create a table of source data. It looks like this:

    • call the “Data Analysis” window;
    • select the “Regression” section;
    • In the “Input interval Y” box, enter the range of values ​​of the dependent variables from column G;
    • click on the red arrow icon to the right of the “Input Range X” window and highlight on the sheet the range of all values ​​from columns B,C,D,F.

    Mark the “New worksheet” item and click “Ok”.

    Obtain a regression analysis for a given problem.

    Study of results and conclusions

    We “collect” the regression equation from the rounded data presented above on the Excel spreadsheet:

    SP = 0.103*SOF + 0.541*VO – 0.031*VK +0.405*VD +0.691*VZP – 265.844.

    In a more familiar mathematical form, it can be written as:

    y = 0.103*x1 + 0.541*x2 – 0.031*x3 +0.405*x4 +0.691*x5 – 265.844

    Data for MMM JSC are presented in the table:

    Substituting them into the regression equation, we get a figure of 64.72 million US dollars. This means that the shares of MMM JSC are not worth purchasing, since their value of 70 million US dollars is quite inflated.

    As you can see, the use of the Excel spreadsheet and the regression equation made it possible to make an informed decision regarding the feasibility of a very specific transaction.

    Now you know what regression is. The Excel examples discussed above will help you solve practical problems in the field of econometrics.

    The MS Excel package allows you to do most of the work very quickly when constructing a linear regression equation. It is important to understand how to interpret the results obtained.

    Requires an add-on to work Analysis package, which must be enabled in the menu item Service\Add-ons

    In Excel 2007, to enable the analysis package, you need to click go to block Excel Options by pressing the button on the left top corner, and then the button Excel Options"at the bottom of the window:



    To build a regression model, you must select the item Service\Data Analysis\Regression. (In Excel 2007, this mode is in the block Data/Data Analysis/Regression). A dialog box will appear that you need to fill out:

    1) Input interval Y¾ contains a link to cells that contain the values ​​of the resulting characteristic y. The values ​​must be arranged in a column;

    2) Input interval X¾ contains a link to cells that contain factor values. The values ​​must be arranged in columns;

    3) Sign Tags set if the first cells contain explanatory text (data labels);

    4) Reliability level¾ is the confidence level, which is considered to be 95% by default. If this value does not suit you, then you need to enable this flag and enter the required value;

    5) Sign Constant-zero is included if it is necessary to construct an equation in which the free variable is ;

    6) Output Options determine where the results should be placed. By default builds mode New worksheet;

    7) Block Leftovers allows you to include the output of residuals and the construction of their graphs.

    As a result, information is displayed containing all the necessary information and grouped into three blocks: Regression statistics, Analysis of variance, Withdrawal of balance. Let's take a closer look at them.

    1. Regression statistics:

    multiple R is determined by the formula ( Pearson correlation coefficient);

    R (coefficient of determination);

    Normalized R-square is calculated by the formula (used for multiple regression);

    Standard error S calculated by the formula ;

    Observations ¾ is the amount of data n.

    2. Analysis of variance, line Regression:

    Parameter df equals m(number of factor sets x);

    Parameter SS is determined by the formula ;

    Parameter MS is determined by the formula ;

    Statistics F is determined by the formula ;

    Significance F. If the resulting number exceeds , then the hypothesis is accepted (there is no linear relationship), otherwise the hypothesis is accepted (there is a linear relationship).


    3. Analysis of variance, line Remainder:

    Parameter df equal to ;

    Parameter SS is determined by the formula ;

    Parameter MS is determined by the formula.

    4. Analysis of variance, line Total contains the sum of the first two columns.

    5. Analysis of variance, line Y-intersection contains the coefficient, standard error and t-statistics.

    P-value ¾ is the value of the significance levels corresponding to the calculated t-statisticians. Determined by the function STUDIDIST( t-statistics; ). If P-value exceeds , then the corresponding variable is statistically insignificant and can be excluded from the model.

    Bottom 95% And Top 95%¾ are the lower and upper limits of the 95 percent confidence intervals for the coefficients of the theoretical linear regression equation. If the confidence value in the data input block was left at its default value, then the last two columns will duplicate the previous ones. If the user has entered their own confidence value, the last two columns contain the lower and upper bound values ​​for the specified confidence level.

    6. Analysis of variance, the lines contain the coefficient values, standard errors, t-statistician, P-values ​​and confidence intervals for the corresponding .

    7. Block Withdrawal of balance contains the predicted values y(in our notation this is ) and residues .