• Kendall correlation coefficient in excel. Kendall rank correlation coefficient. What should we proceed from when determining the topic, object, subject, purpose, objectives and hypothesis of the study?

    Rank correlation coefficient characterizes the general nature of the nonlinear relationship: an increase or decrease in the resultant attribute with an increase in the factorial one. This is an indicator of the tightness of a monotonic nonlinear connection.

    Purpose of the service. Using this online calculator you can calculate Kendal rank correlation coefficient according to all basic formulas, as well as an assessment of its significance.

    Instructions. Specify the amount of data (number of rows). The resulting solution is saved in a Word file.

    The coefficient proposed by Kendal is based on relationships of the “more-less” type, the validity of which was established when constructing the scales.
    Let's select a couple of objects and compare their ranks according to one characteristic and another. If the ranks for a given characteristic form a direct order (i.e., the order of the natural series), then the pair is assigned +1, if the reverse, then –1. For a selected pair, the corresponding plus and minus units (by attribute X and by attribute Y) are multiplied. The result is obviously +1; if the ranks of a pair of both features are located in the same sequence, and –1 if in the opposite order.
    If the rank orders for both characteristics are the same for all pairs, then the sum of units assigned to all pairs of objects is maximum and equal to the number of pairs. If the rank orders of all pairs are reversed, then –C 2 N . In the general case, C 2 N = P + Q, where P is the number of positive and Q the number of negative units assigned to pairs when comparing their ranks on both criteria.
    The value is called the Kendall coefficient.
    It is clear from the formula that the coefficient τ represents the difference between the proportion of pairs of objects whose order is the same on both grounds (relative to the number of all pairs) and the proportion of pairs of objects whose order does not coincide.
    For example, a coefficient value of 0.60 means that 80% of pairs have the same order of objects, and 20% do not (80% + 20% = 100%; 0.80 – 0.20 = 0.60). Those. τ can be interpreted as the difference in the probabilities of matching and not matching orders for both characteristics for a randomly selected pair of objects.
    In the general case, the calculation of τ (more precisely P or Q) even for N of the order of 10 turns out to be cumbersome.
    We'll show you how to simplify the calculations.


    Example. The relationship between the volume of industrial output and investment in fixed capital in 10 regions of one of the federal districts of the Russian Federation in 2003 is characterized by the following data:


    Calculate the Spearman and Kendal rank correlation coefficients. Check their significance at α=0.05. Formulate a conclusion about the relationship between the volume of industrial output and investment in fixed capital for the regions of the Russian Federation under consideration.

    Solution. Let us assign ranks to feature Y and factor X.


    Let's sort the data by X.
    In the row Y to the right of 3 there are 7 ranks greater than 3, therefore, 3 will generate the term 7 in P.
    To the right of 1 are 8 ranks greater than 1 (these are 2, 4, 6, 9, 5, 10, 7, 8), i.e. P will include 8, etc. As a result, P = 37 and using the formulas we have:

    XYrank X, d xrank Y, d yPQ
    18.4 5.57 1 3 7 2
    20.6 2.88 2 1 8 0
    21.5 4.12 3 2 7 0
    35.7 7.24 4 4 6 0
    37.1 9.67 5 6 4 1
    39.8 10.48 6 9 1 3
    51.1 8.58 7 5 3 0
    54.4 14.79 8 10 0 2
    64.6 10.22 9 7 1 0
    90.6 10.45 10 8 0 0
    37 8


    Using simplified formulas:




    where n is the sample size; z kp is the critical point of the two-sided critical region, which is found from the table of the Laplace function by the equality Ф(z kp)=(1-α)/2.
    If |τ|< T kp - нет оснований отвергнуть нулевую гипотезу. Ранговая корреляционная связь между качественными признаками незначима. Если |τ| >T kp - the null hypothesis is rejected. There is a significant rank correlation between qualitative characteristics.
    Let's find the critical point z kp
    Ф(z kp) = (1-α)/2 = (1 - 0.05)/2 = 0.475

    Let's find the critical point:

    Since τ > T kp - we reject the null hypothesis; the rank correlation between the scores on the two tests is significant.

    Example. Based on data on the volume of construction and installation work performed on our own, and the number of employees in 10 construction companies in one of the cities of the Russian Federation, determine the relationship between these characteristics using the Kendel coefficient.

    Solution find using a calculator.
    Let us assign ranks to feature Y and factor X.
    Let's arrange the objects so that their ranks in X represent the natural series. Since the estimates assigned to each pair of this series are positive, the “+1” values ​​included in P will be generated only by those pairs whose ranks in Y form a direct order.
    They can be easily calculated by sequentially comparing the ranks of each object in the Y row with the steel ones.
    Kendal coefficient.

    In the general case, the calculation of τ (more precisely P or Q) even for N of the order of 10 turns out to be cumbersome. We'll show you how to simplify the calculations.

    or

    Solution.
    Let's sort the data by X.
    In the row Y to the right of 2 there are 8 ranks greater than 2, therefore, 2 will generate the term 8 in P.
    To the right of 4 are 6 ranks greater than 4 (these are 7, 5, 6, 8, 9, 10), i.e. P will include 6, etc. As a result, P = 29 and using the formulas we have:

    XYrank X, d xrank Y, d yPQ
    38 292 1 2 8 1
    50 302 2 4 6 2
    52 366 3 7 3 4
    54 312 4 5 4 2
    59 359 5 6 3 2
    61 398 6 8 2 2
    66 401 7 9 1 2
    70 298 8 3 1 1
    71 283 9 1 1 0
    73 413 10 10 0 0
    29 16


    Using simplified formulas:


    In order to test the null hypothesis at the significance level α that the general Kendall rank correlation coefficient is equal to zero under the competing hypothesis H 1: τ ≠ 0, it is necessary to calculate the critical point:

    where n is the sample size; z kp is the critical point of the two-sided critical region, which is found from the table of the Laplace function by the equality Ф(z kp)=(1 - α)/2.
    If |τ| T kp - the null hypothesis is rejected. There is a significant rank correlation between qualitative characteristics.
    Let's find the critical point z kp
    Ф(z kp) = (1 - α)/2 = (1 - 0.05)/2 = 0.475
    Using the Laplace table we find z kp = 1.96
    Let's find the critical point:

    Since τ

    Kendall's correlation coefficient is used when variables are represented on two ordinal scales, provided that there are no associated ranks. The calculation of the Kendall coefficient involves counting the number of matches and inversions. Let's consider this procedure using the example of the previous problem.

    The algorithm for solving the problem is as follows:

      We rearrange the data in the table. 8.5 so that one of the rows (in this case the row x i) turned out to be ranked. In other words, we rearrange the pairs x And y in the right order and We enter the data in columns 1 and 2 of the table. 8.6.

    Table 8.6

    x i

    y i

    2. Determine the “degree of ranking” of the 2nd row ( y i). This procedure is carried out in the following sequence:

    a) take the first value of the unranked series “3”. Counting the number of ranks below given number, which more compared value. There are 9 such values ​​(numbers 6, 7, 4, 9, 5, 11, 8, 12 and 10). We enter the number 9 in the “matches” column. Then we count the number of values ​​that less three. There are 2 such values ​​(ranks 1 and 2); We enter the number 2 in the “inversion” column.

    b) discard the number 3 (we have already worked with it) and repeat the procedure for the next value “6”: the number of matches is 6 (ranks 7, 9, 11, 8, 12 and 10), the number of inversions is 4 (ranks 1, 2 , 4 and 5). We enter the number 6 in the “coincidence” column, and the number 4 in the “inversion” column.

    c) the procedure is repeated in a similar way until the end of the row; it should be remembered that each “worked out” value is excluded from further consideration (only ranks that lie below this number are calculated).

    Note

    In order not to make mistakes in calculations, it should be borne in mind that with each “step” the sum of coincidences and inversions decreases by one; This is understandable given that each time one value is excluded from consideration.

    3. The sum of matches is calculated (P) and the sum of inversions (Q); the data is entered into one and three interchangeable formulas for the Kendall coefficient (8.10). The corresponding calculations are carried out.

    t (8.10)

    In our case:

    In table XIV Appendix contains the critical values ​​of the coefficient for this sample: τ cr. = 0.45; 0.59. The empirically obtained value is compared with the tabulated one.

    Conclusion

    τ = 0.55 > τ cr. = 0.45. The correlation is statistically significant at level 1.

    Note:

    If necessary (for example, if there is no table of critical values), statistical significance t Kendall can be determined by the following formula:

    (8.11)

    Where S* = P – Q+ 1 if P< Q , And S* = P – Q – 1 if P>Q.

    Values z for the corresponding significance level correspond to the Pearson measure and are found in the corresponding tables (not included in the appendix. For standard significance levels z kr = 1.96 (for β 1 = 0.95) and 2.58 (for β 2 = 0.99). Kendall's correlation coefficient is statistically significant if z > z cr

    In our case S* = P – Q– 1 = 35 and z= 2.40, i.e. the initial conclusion is confirmed: the correlation between the characteristics is statistically significant for the 1st level of significance.

    When ranking, the expert must arrange the evaluated elements in ascending (descending) order of their preference and assign ranks to each of them in the form of natural numbers. In direct ranking, the most preferred element has rank 1 (sometimes 0), and the least preferred element has rank m.

    If the expert cannot carry out a strict ranking because, in his opinion, some elements are the same in preference, then it is permissible to assign the same ranks to such elements. To ensure that the sum of ranks is equal to the sum of places of ranked elements, so-called standardized ranks are used. The standardized rank is the arithmetic mean of the numbers of elements in a ranked series that are the same in preference.

    Example 2.6. The expert ranked the six items by preference as follows:

    Then the standardized ranks of these elements will be

    Thus, the sum of the ranks assigned to the elements will be equal to the sum of the numbers in the natural series.

    The accuracy of expressing preference by ranking items depends significantly on the power of the set of presentations. The ranking procedure gives the most reliable results (in terms of the degree of closeness between the revealed preference and the “true”) when the number of elements being evaluated is no more than 10. The maximum power of the presentation set should not exceed 20.

    Processing and analysis of rankings are carried out with the aim of constructing a group preference relationship based on individual preferences. In this case, the following tasks can be set: a) determining the closeness of the connection between the rankings of two experts on elements of a set of presentations; b) determining the relationship between two elements according to the individual opinions of group members regarding the various characteristics of these elements; c) assessing the consistency of expert opinions in a group containing more than two experts.

    In the first two cases, the rank correlation coefficient is used as a measure of the closeness of the connection. Depending on whether only strict or non-strict ranking is allowed, either Kendall's or Spearman's rank correlation coefficient is used.

    Kendall's rank correlation coefficient for problem (a)

    Where m− number of elements; r 1 i – rank assigned by the first expert i−th element; r 2 i – the same, by the second expert.

    For problem (b), components (2.5) have the following meaning: m - the number of characteristics of the two elements being assessed; r 1 i(r 2 i) - rank of the i-th characteristic in the ranking of the first (second) element, set by a group of experts.

    For strict ranking, the rank correlation coefficient is used r Spearman:


    whose components have the same meaning as in (2.5).

    Correlation coefficients (2.5), (2.6) vary from -1 to +1. If the correlation coefficient is +1, then this means that the rankings are the same; if it is equal to -1, then − are opposite (rankings are opposite to each other). If the correlation coefficient is zero, it means that the rankings are linearly independent (uncorrelated).

    Since with this approach (the expert is a “measurer” with a random error) individual rankings are considered random, the task arises of statistical testing of the hypothesis about the significance of the resulting correlation coefficient. In this case, the Neyman-Pearson criterion is used: the significance level of the criterion α is set and, knowing the laws of distribution of the correlation coefficient, the threshold value is determined c α, with which the resulting value of the correlation coefficient is compared. The critical area is right-handed (in practice, the criterion value is usually first calculated and the significance level is determined from it, which is compared with the threshold level α ).

    For m > 10, Kendall's rank correlation coefficient τ has a distribution close to normal with the parameters:

    where M [τ] – mathematical expectation; D [τ] – dispersion.

    In this case, tables of the standard normal distribution function are used:

    and the boundary τ α of the critical region is defined as the root of the equation

    If the calculated value of the coefficient τ ≥ τ α, then the rankings are considered to be in really good agreement. Typically, the value of α is chosen in the range of 0.01-0.05. For t ≤ 10, the distribution of t is given in Table. 2.1.

    Checking the significance of the consistency of two rankings using the Spearman coefficient ρ is carried out in the same order using Student distribution tables for m > 10.

    In this case the value

    has a distribution well approximated by the Student distribution with m– 2 degrees of freedom. At m> 30 the distribution of ρ agrees well with the normal one, having M [ρ] = 0 and D [ρ] = .

    For m ≤ 10, the significance of ρ is checked using the table. 2.2.

    If the rankings are not strict, then the Spearman coefficient

    where ρ – is calculated according to (2.6);

    where k 1 , k 2 are the number of different groups of non-strict ranks in the first and second rankings, respectively; l i is the number of identical ranks in i th group. When using the rank correlation coefficients ρ Spearman and τ Kendall in practice, it should be kept in mind that the coefficient ρ provides a more accurate result in the sense of minimum variance.

    Table 2.1.Kendall's rank correlation coefficient distribution

    Brief theory

    Kendall's correlation coefficient is used when variables are represented on two ordinal scales, provided that there are no associated ranks. The calculation of the Kendall coefficient involves counting the number of matches and inversions.

    This coefficient varies within limits and is calculated using the formula:

    For calculation, all units are ranked according to ; according to a row of another characteristic, for each rank the number of subsequent ranks exceeding the given one (we denote them by ), and the number of subsequent ranks below the given one (we denote them by ).

    It can be shown that

    and Kendall's rank correlation coefficient can be written as

    In order to test the null hypothesis at the significance level that the general Kendall rank correlation coefficient is equal to zero under a competing hypothesis, it is necessary to calculate the critical point:

    where is the sample size; – critical point of the two-sided critical region, which is found from the table of the Laplace function by equality

    If – there is no reason to reject the null hypothesis. The rank correlation between the characteristics is insignificant.

    If – the null hypothesis is rejected. There is a significant rank correlation between the characteristics.

    Example of problem solution

    Problem condition

    During the recruitment process, seven candidates for vacant positions were given two tests. The test results (in points) are shown in the table:

    Test Candidate 1 2 3 4 5 6 7 1 31 82 25 26 53 30 29 2 21 55 8 27 32 42 26

    Calculate the Kendall rank correlation coefficient between the test results for two tests and evaluate its significance at the level.

    Problem solution

    Let's calculate the Kendall coefficient

    The ranks of the factor characteristic are arranged strictly in ascending order and the corresponding ranks of the resultant characteristic are recorded in parallel. For each rank, from the number of ranks following it, the number of ranks larger than it in value is counted (entered in the column) and the number of ranks smaller in value (entered in the column).

    1 1 6 0 2 4 3 2 3 3 3 1 4 6 1 2 5 2 2 0 6 5 1 0 7 7 0 0 Sum 16 5

    Presentation and pre-processing of expert assessments

    Several types of assessments are used in practice:

    - qualitative (often-rarely, worse-better, yes-no),

    - scale ratings (value intervals 50-75, 76-90, 91-120, etc.),

    Points from a given interval (from 2 to 5, 1 -10), mutually independent,

    Ranked (objects are arranged by the expert in a certain order, and each is assigned a serial number - rank),

    Comparative, obtained by one of the comparison methods

    sequential comparison method

    method of pairwise comparison of factors.

    At the next step of processing expert opinions, it is necessary to evaluate the degree of agreement between these opinions.

    Ratings received from experts can be considered as a random variable, the distribution of which reflects the opinions of experts about the probability of a particular choice of event (factor). Therefore, to analyze the spread and consistency of expert assessments, generalized statistical characteristics are used - averages and measures of spread:

    Mean square error,

    Variation range min – max,

    - coefficient of variation V = average square deviation / arithm average (suitable for any type of assessment)

    V i = σ i / x i avg

    For evaluation similarity measures and opinions each pair of experts A variety of methods can be used:

    association coefficients, with the help of which the number of matching and mismatching answers is taken into account,

    inconsistency coefficients expert opinions,

    All these measures can be used either to compare the opinions of two experts, or to analyze the relationship between a series of assessments on two characteristics.

    Spearman's paired rank correlation coefficient:

    where n is the number of experts,

    c k – the difference between the estimates of the i-th and j-th experts for all T factors

    Kendall's rank correlation coefficient (concordance coefficient) gives an overall assessment of the consistency of the opinions of all experts on all factors, but only for cases where rank estimates were used.

    It has been proven that the value of S, when all experts give the same assessments of all factors, has a maximum value equal to

    where n is the number of factors,

    m – number of experts.

    The concordance coefficient is equal to the ratio

    Moreover, if W is close to 1, then all experts gave fairly consistent estimates, otherwise their opinions are not consistent.

    The formula for calculating S is given below:

    where r ij are the ranking estimates of the i-th factor by the j-th expert,

    r avg is the average rank over the entire assessment matrix and is equal to

    And therefore the formula for calculating S can take the form:

    If individual assessments from one expert coincide, and they were standardized during processing, then another formula is used to calculate the concordance coefficient:



    where T j is calculated for each expert (if his assessments were repeated for different objects) taking into account repetitions according to the following rules:

    where t j is the number of groups of equal ranks for the j-th expert, and

    h k is the number of equal ranks in the k-th group of related ranks of the j-th expert.

    EXAMPLE. Let 5 experts on six factors answer the ranking as shown in Table 3:

    Table 3 - Experts' answers

    Experts O1 O2 O3 O4 O5 O6 Sum of ranks by expert
    E1
    E2
    E3
    E4
    E5

    Due to the fact that we did not obtain a strict ranking (the experts’ assessments are repeated, and the sums of ranks are not equal), we will transform the assessments and obtain the associated ranks (Table 4):

    Table 4 – Associated ranks of expert assessments

    Experts O1 O2 O3 O4 O5 O6 Sum of ranks by expert
    E1 2,5 2,5
    E2
    E3 1,5 1,5 4,5 4,5
    E4 2,5 2,5 4,5 4,5
    E5 5,5 5,5
    Sum of ranks for an object 7,5 9,5 23,5 29,5

    Now let’s determine the degree of agreement between expert opinions using the concordance coefficient. Since the ranks are related, we will calculate W using the formula (**).

    Then r av =7*5/2=17.5

    S = 10 2 +8 2 +4.5 2 +4.5 2 +6 2 +12 2 = 384.5

    Let's move on to the calculations of W. To do this, let's calculate the values ​​of T j separately. In the example, the ratings are specially selected in such a way that each expert has repeating ratings: the 1st has two, the second has three, the third has two groups of two ratings, and the fourth and fifth have two identical ratings. From here:

    T 1 = 2 3 – 2 = 6 T 5 = 6

    T 2 = 3 3 – 3 = 24

    T 3 = 2 3 –2+ 2 3 –2 = 12 T 4 = 12

    We see that the consistency of expert opinions is quite high and we can move on to the next stage of the study - justification and adoption of the solution alternative recommended by the experts.

    Otherwise, you must return to steps 4-8.