• Options for exponential regression in excel. Correlation and regression analysis in Excel: execution instructions

    The regression line is a graphical reflection of the relationship between phenomena. You can very clearly construct a regression line in Excel program.

    To do this you need:

    1.Open Excel

    2.Create data columns. In our example, we will build a regression line, or relationship, between aggressiveness and self-doubt in first-graders. 30 children participated in the experiment, the data is presented in the Excel table:

    1 column - subject number

    2 column - aggressiveness in points

    3 column - self-doubt in points

    3.Then you need to select both columns (without the column name), click tab insert , choose spot , and choose the very first one from the proposed layouts dot with markers .

    4. So we have a template for the regression line - the so-called - scatter plot. To go to the regression line, click on the resulting figure and press tab constructor, find on the panel chart layouts and choose M A ket9 , it also says f(x)

    5. So, we have a regression line. The graph also shows its equation and the square of the correlation coefficient

    6. All that remains is to add the name of the graph and the name of the axes. Also, if desired, you can remove the legend, reduce the number horizontal lines grids (tab layout , then net ). Basic changes and settings are made in the tab Layout

    The regression line was constructed in MS Excel. Now you can add it to the text of the work.

    The MS Excel package allows you to build an equation linear regression do most of the work very quickly. It is important to understand how to interpret the results obtained. To build a regression model, you must select Tools\Data Analysis\Regression (in Excel 2007 this mode is in the Data/Data Analysis/Regression block). Then copy the results into a block for analysis.

    Initial data:

    Analysis results

    Include in report
    Calculation of regression equation parameters
    Theoretical material
    Regression equation on standard scale
    Multiple Correlation Coefficient (Multiple Correlation Index)
    Partial elasticity coefficients
    Comparative assessment of the influence of the analyzed factors on the resulting characteristic (d - coefficients of separate determination)

    Checking the quality of the constructed regression equation
    Significance of regression coefficients b i (t-statistics. Student's test)
    Significance of the equation as a whole (F-statistics. Fisher's test). Determination coefficient
    Partial F-tests

    Significance level 0.005 0.01 0.025 0.05 0.1 0.25 0.4

    Regression analysis is a statistical research method that allows you to show the dependence of a particular parameter on one or more independent variables. In the pre-computer era, its use was quite difficult, especially when it came to large volumes of data. Today, having learned how to build regression in Excel, you can solve complex statistical problems in just a couple of minutes. Below are specific examples from the field of economics.

    Types of Regression

    This concept itself was introduced into mathematics in 1886. Regression happens:

    • linear;
    • parabolic;
    • sedate;
    • exponential;
    • hyperbolic;
    • demonstrative;
    • logarithmic.

    Example 1

    Let's consider the problem of determining the dependence of the number of team members who quit on the average salary at 6 industrial enterprises.

    Task. At six enterprises, the average monthly salary and the number of employees who quit due to at will. IN tabular form we have:

    Number of people who quit

    Salary

    30,000 rubles

    35,000 rubles

    40,000 rubles

    45,000 rubles

    50,000 rubles

    55,000 rubles

    60,000 rubles

    For the task of determining the dependence of the number of quitting workers on the average salary at 6 enterprises, the regression model has the form of the equation Y = a 0 + a 1 x 1 +...+a k x k, where x i are the influencing variables, a i are the regression coefficients, and k is the number of factors.

    For this problem, Y is the indicator of quitting employees, and the influencing factor is salary, which we denote by X.

    Using the capabilities of the Excel spreadsheet processor

    Regression analysis in Excel must be preceded by applying built-in functions to existing tabular data. However, for these purposes it is better to use the very useful “Analysis Pack” add-on. To activate it you need:

    • from the “File” tab go to the “Options” section;
    • in the window that opens, select the line “Add-ons”;
    • click on the “Go” button located below, to the right of the “Management” line;
    • check the box next to the name “Analysis package” and confirm your actions by clicking “Ok”.

    If everything is done correctly, on the right side of the “Data” tab, located above the “Excel” worksheet, you will see desired button.

    in Excel

    Now that you have everything you need at hand virtual instruments to carry out econometric calculations, we can begin to solve our problem. To do this:

    • Click on the “Data Analysis” button;
    • in the window that opens, click on the “Regression” button;
    • in the tab that appears, enter the range of values ​​for Y (the number of quitting employees) and for X (their salaries);
    • We confirm our actions by pressing the “Ok” button.

    As a result, the program will automatically fill in a new sheet table processor regression analysis data. Pay attention! Excel allows you to manually set the location you prefer for this purpose. For example, it could be the same sheet where the Y and X values ​​are located, or even new book, specifically designed for storing such data.

    Analysis of regression results for R-squared

    IN Excel data obtained during processing of the data of the example under consideration have the form:

    First of all, you should pay attention to the R-squared value. It represents the coefficient of determination. IN in this example R-squared = 0.755 (75.5%), i.e., the calculated parameters of the model explain the relationship between the considered parameters by 75.5%. The higher the value of the coefficient of determination, the more suitable the selected model is for a specific task. It is considered to correctly describe the real situation when the R-square value is above 0.8. If R-squared<0,5, то такой анализа регрессии в Excel нельзя считать резонным.

    Odds Analysis

    The number 64.1428 shows what the value of Y will be if all the variables xi in the model we are considering are reset to zero. In other words, it can be argued that the value of the analyzed parameter is also influenced by other factors that are not described in a specific model.

    The next coefficient -0.16285, located in cell B18, shows the weight of the influence of variable X on Y. This means that the average monthly salary of employees within the model under consideration affects the number of quitters with a weight of -0.16285, i.e. the degree of its influence is completely small. The "-" sign indicates that the coefficient is negative. This is obvious, since everyone knows that the higher the salary at the enterprise, the fewer people express a desire to terminate the employment contract or quit.

    Multiple regression

    This term refers to a relationship equation with several independent variables of the form:

    y=f(x 1 +x 2 +…x m) + ε, where y is the resultant characteristic (dependent variable), and x 1, x 2,…x m are factor characteristics (independent variables).

    Parameter Estimation

    For multiple regression (MR), it is carried out using the least squares method (OLS). For linear equations of the form Y = a + b 1 x 1 +…+b m x m + ε we construct a system of normal equations (see below)

    To understand the principle of the method, consider a two-factor case. Then we have a situation described by the formula

    From here we get:

    where σ is the variance of the corresponding feature reflected in the index.

    OLS is applicable to the MR equation on a standardized scale. In this case, we get the equation:

    in which t y, t x 1, … t xm are standardized variables, for which the average values ​​are equal to 0; β i are the standardized regression coefficients, and the standard deviation is 1.

    Please note that all β i in this case are specified as normalized and centralized, therefore their comparison with each other is considered correct and acceptable. In addition, it is customary to screen out factors by discarding those with the lowest βi values.

    Problem Using Linear Regression Equation

    Suppose we have a table of price dynamics for a specific product N over the past 8 months. It is necessary to make a decision on the advisability of purchasing a batch of it at a price of 1850 rubles/t.

    month number

    month name

    product price N

    1750 rubles per ton

    1755 rubles per ton

    1767 rubles per ton

    1760 rubles per ton

    1770 rubles per ton

    1790 rubles per ton

    1810 rubles per ton

    1840 rubles per ton

    To solve this problem in the Excel spreadsheet processor, you need to use the “Data Analysis” tool, already known from the example presented above. Next, select the “Regression” section and set the parameters. It must be remembered that in the “Input interval Y” field a range of values ​​must be entered for the dependent variable (in this case, prices for goods in specific months of the year), and in the “Input interval X” - for the independent variable (month number). Confirm the action by clicking “Ok”. On a new sheet (if so indicated) we obtain data for regression.

    Using them, we construct a linear equation of the form y=ax+b, where the parameters a and b are the coefficients of the line with the name of the month number and the coefficients and lines “Y-intersection” from the sheet with the results regression analysis. Thus, the linear regression equation (LR) for task 3 is written as:

    Product price N = 11.714* month number + 1727.54.

    or in algebraic notation

    y = 11.714 x + 1727.54

    Analysis of results

    To decide whether the resulting linear regression equation is adequate, the coefficients of multiple correlation (MCC) and determination are used, as well as the Fisher test and the Student t test. In the Excel spreadsheet with regression results, they are called multiple R, R-squared, F-statistic and t-statistic, respectively.

    KMC R makes it possible to assess the closeness of the probabilistic relationship between the independent and dependent variables. Its high value indicates a fairly strong connection between the variables “Number of month” and “Price of product N in rubles per 1 ton”. However, the nature of this relationship remains unknown.

    The square of the coefficient of determination R2 (RI) is a numerical characteristic of the proportion of the total scatter and shows the scatter of which part of the experimental data, i.e. values ​​of the dependent variable corresponds to the linear regression equation. In the problem under consideration, this value is equal to 84.8%, i.e., statistical data are described with a high degree of accuracy by the resulting SD.

    The F statistic, also called Fisher's test, is used to evaluate significance linear dependence, refuting or confirming the hypothesis of its existence.

    (Student's test) helps to evaluate the significance of the coefficient with an unknown or free term of the linear relationship. If the value of the t-test > t cr, then the hypothesis about the insignificance of the free term linear equation rejected.

    In the problem under consideration for the free term, using Excel tools, it was obtained that t = 169.20903, and p = 2.89E-12, i.e., we have zero probability that the correct hypothesis about the insignificance of the free term will be rejected. For the coefficient for the unknown t=5.79405, and p=0.001158. In other words, the probability that the correct hypothesis about the insignificance of the coefficient for an unknown will be rejected is 0.12%.

    Thus, it can be argued that the resulting linear regression equation is adequate.

    The problem of the feasibility of purchasing a block of shares

    Multiple regression in Excel is performed using the same Data Analysis tool. Let's consider a specific application problem.

    The management of the NNN company must decide on the advisability of purchasing a 20% stake in MMM JSC. The cost of the package (SP) is 70 million US dollars. NNN specialists have collected data on similar transactions. It was decided to evaluate the value of the shareholding according to such parameters, expressed in millions of US dollars, as:

    • accounts payable (VK);
    • annual turnover volume (VO);
    • accounts receivable (VD);
    • cost of fixed assets (COF).

    In addition, the parameter of the enterprise's wage arrears (V3 P) in thousands of US dollars is used.

    Solution using Excel spreadsheet processor

    First of all, you need to create a table of source data. It looks like this:

    • call the “Data Analysis” window;
    • select the “Regression” section;
    • In the “Input interval Y” box, enter the range of values ​​of the dependent variables from column G;
    • click on the red arrow icon to the right of the “Input Range X” window and highlight on the sheet the range of all values ​​from columns B,C,D,F.

    Mark the “New worksheet” item and click “Ok”.

    Obtain a regression analysis for a given problem.

    Study of results and conclusions

    We “collect” the regression equation from the rounded data presented above on the Excel spreadsheet:

    SP = 0.103*SOF + 0.541*VO - 0.031*VK +0.405*VD +0.691*VZP - 265.844.

    In a more familiar mathematical form, it can be written as:

    y = 0.103*x1 + 0.541*x2 - 0.031*x3 +0.405*x4 +0.691*x5 - 265.844

    Data for MMM JSC are presented in the table:

    Substituting them into the regression equation, we get a figure of 64.72 million US dollars. This means that the shares of MMM JSC are not worth purchasing, since their value of 70 million US dollars is quite inflated.

    As you can see, the use of the Excel spreadsheet processor and the regression equation made it possible to make an informed decision regarding the feasibility of a very specific transaction.

    Now you know what regression is. The Excel examples discussed above will help you solve practical problems in the field of econometrics.

    Construction of linear regression, evaluation of its parameters and their significance can be performed much faster when using the package Excel analysis(Regression). Let us consider the interpretation of the results obtained in the general case ( k explanatory variables) according to example 3.6.

    In the table regression statistics the following values ​​are given:

    Multiple R – multiple correlation coefficient;

    R- square– coefficient of determination R 2 ;

    Normalized R - square– adjusted R 2 adjusted for the number of degrees of freedom;

    Standard error– regression standard error S;

    Observations – number of observations n.

    In the table Analysis of variance are given:

    1. Column df - number of degrees of freedom equal to

    for string Regression df = k;

    for string Remainderdf = nk – 1;

    for string Totaldf = n– 1.

    2. Column SS – the sum of squared deviations equal to

    for string Regression ;

    for string Remainder ;

    for string Total .

    3. Column MS variances determined by the formula MS = SS/df:

    for string Regression– factor dispersion;

    for string Remainder– residual variance.

    4. Column F – calculated value F-criterion calculated using the formula

    F = MS(regression)/ MS(remainder).

    5. Column Significance F – significance level value corresponding to the calculated F-statistics .

    Significance F= FDIST( F- statistics, df(regression), df(remainder)).

    If significance F < стандартного уровня значимости, то R 2 is statistically significant.

    Odds Standard error t-statistics P-value Bottom 95% Top 95%
    Y 65,92 11,74 5,61 0,00080 38,16 93,68
    X 0,107 0,014 7,32 0,00016 0,0728 0,142

    This table shows:

    1. Odds– coefficient values a, b.

    2. Standard errorstandard errors regression coefficients S a, Sb.



    3. t- statistics– calculated values t -criteria calculated by the formula:

    t-statistic = Coefficients/Standard error.

    4.R-value (significance t) is the significance level value corresponding to the calculated t- statistics.

    R-value = STUDIDIST(t-statistics, df(remainder)).

    If R-meaning< стандартного уровня значимости, то соответствующий коэффициент статистически значим.

    5. Bottom 95% and Top 95%– lower and upper limits of 95% confidence intervals for the coefficients of the theoretical linear regression equation.

    WITHDRAWAL OF THE REST
    Observation Predicted y Residues e
    72,70 -29,70
    82,91 -20,91
    94,53 -4,53
    105,72 5,27
    117,56 12,44
    129,70 19,29
    144,22 20,77
    166,49 24,50
    268,13 -27,13

    In the table WITHDRAWAL OF THE REST indicated:

    in column Observation– observation number;

    in column Foretold y – calculated values ​​of the dependent variable;

    in column Leftovers e – the difference between the observed and calculated values ​​of the dependent variable.

    Example 3.6. There are data (conventional units) on food costs y and per capita income x for nine groups of families:

    x
    y

    Using the results of the Excel analysis package (Regression), we will analyze the dependence of food costs on per capita income.

    The results of regression analysis are usually written in the form:

    where the standard errors of the regression coefficients are indicated in parentheses.

    Regression coefficients A = 65,92 and b= 0.107. Direction of communication between y And x determines the sign of the regression coefficient b= 0.107, i.e. the connection is direct and positive. Coefficient b= 0.107 shows that with an increase in per capita income by 1 conventional. units food costs increase by 0.107 conventional units. units

    Let us evaluate the significance of the coefficients of the resulting model. Significance of coefficients ( a, b) is checked by t-test:

    P-value ( a) = 0,00080 < 0,01 < 0,05

    P-value ( b) = 0,00016 < 0,01 < 0,05,

    therefore, the coefficients ( a, b) are significant at the 1% level, and even more so at the 5% significance level. Thus, the regression coefficients are significant and the model is adequate to the original data.

    The regression estimation results are compatible not only with the obtained values ​​of the regression coefficients, but also with a certain set of them (confidence interval). With a 95% probability, confidence intervals for the coefficients are (38.16 – 93.68) for a and (0.0728 – 0.142) for b.

    The quality of the model is assessed by the coefficient of determination R 2 .

    Magnitude R 2 = 0.884 means that the per capita income factor can explain 88.4% of the variation (scatter) in food expenses.

    Significance R 2 is checked by F- test: significance F = 0,00016 < 0,01 < 0,05, следовательно, R 2 is significant at the 1% level, and even more so at the 5% significance level.

    In the case of pairwise linear regression, the correlation coefficient can be defined as . The obtained value of the correlation coefficient indicates that the relationship between food expenses and per capita income is very close.

    Regression and correlation analysis – statistical methods research. These are the most common ways to show the dependence of a parameter on one or more independent variables.

    Below, using specific practical examples, we will consider these two very popular analyzes among economists. We will also give an example of obtaining results when combining them.

    Regression Analysis in Excel

    Shows the influence of some values ​​(independent, independent) on the dependent variable. For example, how does the number of economically active population depend on the number of enterprises, wages and other parameters. Or: how do foreign investments, energy prices, etc. affect the level of GDP.

    The result of the analysis allows you to highlight priorities. And based on the main factors, predict, plan the development of priority areas, and make management decisions.

    Regression happens:

    • linear (y = a + bx);
    • parabolic (y = a + bx + cx 2);
    • exponential (y = a * exp(bx));
    • power (y = a*x^b);
    • hyperbolic (y = b/x + a);
    • logarithmic (y = b * 1n(x) + a);
    • exponential (y = a * b^x).

    Let's look at an example of building a regression model in Excel and interpreting the results. Let's take the linear type of regression.

    Task. At 6 enterprises, the average monthly salary and the number of quitting employees were analyzed. It is necessary to determine the dependence of the number of quitting employees on the average salary.

    The linear regression model has the following form:

    Y = a 0 + a 1 x 1 +…+a k x k.

    Where a are regression coefficients, x are influencing variables, k is the number of factors.

    In our example, Y is the indicator of quitting employees. The influencing factor is wages (x).

    Excel has built-in functions that can help you calculate the parameters of a linear regression model. But the “Analysis Package” add-on will do this faster.

    We activate a powerful analytical tool:

    Once activated, the add-on will be available in the Data tab.

    Now let's do the regression analysis itself.



    First of all, we pay attention to R-squared and coefficients.

    R-squared is the coefficient of determination. In our example – 0.755, or 75.5%. This means that the calculated parameters of the model explain 75.5% of the relationship between the studied parameters. The higher the coefficient of determination, the better the model. Good - above 0.8. Bad – less than 0.5 (such an analysis can hardly be considered reasonable). In our example – “not bad”.

    The coefficient 64.1428 shows what Y will be if all variables in the model under consideration are equal to 0. That is, the value of the analyzed parameter is also influenced by other factors not described in the model.

    The coefficient -0.16285 shows the weight of variable X on Y. That is, the average monthly salary within this model affects the number of quitters with a weight of -0.16285 (this is a small degree of influence). The “-” sign indicates a negative impact: the higher the salary, the fewer people quit. Which is fair.

    

    Correlation Analysis in Excel

    Correlation analysis helps determine whether there is a relationship between indicators in one or two samples. For example, between the operating time of a machine and the cost of repairs, the price of equipment and the duration of operation, the height and weight of children, etc.

    If there is a connection, then does an increase in one parameter lead to an increase (positive correlation) or a decrease (negative) of the other. Correlation analysis helps the analyst determine whether the value of one indicator can be used to predict the possible value of another.

    The correlation coefficient is denoted by r. Varies from +1 to -1. The classification of correlations for different areas will be different. When the coefficient is 0, there is no linear relationship between samples.

    Let's look at how to use Excel tools find the correlation coefficient.

    To find paired coefficients, the CORREL function is used.

    Objective: Determine whether there is a relationship between operating time lathe and the cost of its maintenance.

    Place the cursor in any cell and press the fx button.

    1. In the “Statistical” category, select the CORREL function.
    2. Argument “Array 1” - the first range of values ​​– machine operating time: A2:A14.
    3. Argument “Array 2” - second range of values ​​– repair cost: B2:B14. Click OK.

    To determine the type of connection, you need to look at the absolute number of the coefficient (each field of activity has its own scale).

    For correlation analysis several parameters (more than 2), it is more convenient to use “Data Analysis” (the “Analysis Package” add-on). You need to select correlation from the list and designate the array. All.

    The resulting coefficients will be displayed in the correlation matrix. Like this:

    Correlation and regression analysis

    In practice, these two techniques are often used together.

    Example:


    Now the regression analysis data has become visible.