• Testing simple hypotheses using the Pearson chi-square test in MS EXCEL. P.2. Pearson goodness-of-fit test (c2) 2 criteria

    OPR. Empirical frequencies are actually observed frequencies.

    TESTING THE HYPOTHESIS ABOUT THE DISTRIBUTION OF THE POPULATION. PEARSON CRITERION

    As noted earlier, assumptions about the type of distribution can be made based on theoretical premises. However, no matter how well the theoretical distribution law is chosen, discrepancies are inevitable between the empirical and theoretical distributions. The question naturally arises: are these discrepancies explained only by random circumstances associated with a limited number of observations, or are they significant and are associated with the fact that the theoretical distribution law was chosen poorly. The criterion of agreement is used to answer this question, i.e.

    OPR. Agreement criterion is called a criterion for testing a hypothesis about the assumed law of an unknown distribution.

    For each criterion, i.e. corresponding distribution, tables are usually compiled from which they are found k kr (see appendices). After the critical point is found, the observed value of the criterion is calculated from the sample data TO obs. If TO obs > k kr, then the null hypothesis is rejected, if on the contrary, then it is accepted.

    Let us describe the application of the Pearson criterion to testing the hypothesis about the normal distribution of the population. The Pearson criterion answers the question of whether the discrepancy between empirical and theoretical frequencies is due to chance?

    The Pearson criterion, like any criterion, does not prove the validity of the hypothesis, but only establishes, at the accepted level of significance, its agreement or disagreement with observational data.

    So, let an empirical distribution be obtained from a sample of size n. At significance level a, it is necessary to test the null hypothesis: the population is normally distributed.

    The random variable c 2 = is taken as a criterion for testing the null hypothesis, where are the empirical frequencies; - theoretical frequencies.

    This SV has a c 2 distribution with k degrees of freedom. The number of degrees of freedom is found by the equality k=m –r -1, m – the number of partial sampling intervals; r – number of distribution parameters. For a normal distribution r=2 (a and s), then k=m –3.

    In order to test the null hypothesis at a given significance level: the population is normally distributed, you need to:

    1.Calculate the sample mean and sample standard deviation.

    2.Calculate theoretical frequencies,

    where n is the sample size; h – step (difference between two adjacent options); ; The function values ​​are determined by the application.

    3. Compare empirical and theoretical frequencies using the Pearson test. To do this:



    a) find the observed value of the criterion;

    b) using the table of critical points of the distribution c 2, using a given significance level a and the number of degrees of freedom k, find the critical point.

    If< - нет оснований отвергнуть нулевую гипотезу. Если >- the null hypothesis is rejected.

    Comment. Few frequencies (<5) следует объединить; в этом случае и соответствующие им теоретические частоты также надо сложить. Если производилось объединение частот, то при определении числа степеней свободы следует в качестве m принять число групп выборки, оставшихся после объединения частот.

    Purpose of the criterion

    The χ 2 test is used for two purposes;

    1) to compare the empirical distribution of the characteristic with theoretical - uniform, normal or otherwise;

    2) for comparison two, three or more empirical distributions of the same characteristic 12.

    Description of criterion

    χ2 test answers the question of whether different values ​​of a characteristic occur with equal frequency in empirical and theoretical distributions or in two or more empirical distributions.

    The advantage of the method is that it allows you to compare the distributions of characteristics presented on any scale, starting from the scale of names (see paragraph 1.2). In the simplest case of an alternative distribution “yes - no”, “allowed a defect - did not allow a defect”, “solved a problem - did not solve a problem”, etc., we can already apply the χ 2 criterion.

    Let's say some observer records the number of pedestrians who chose the right or left of two symmetrical paths on the way from point A to point B (see Fig. 4.3).

    Suppose, as a result of 70 observations, it is established that E\ people chose the right path, and only 19 - the left. Using the χ 2 test we can determine whether a given distribution of selections differs from a uniform distribution in which both tracks would be selected at the same frequency. This is a comparison option for the received umpyric distributions from theoretical. Such a task may arise, for example, in applied psychological research related to design in architecture, communication systems, etc.

    But let’s imagine that the observer is solving a completely different problem: he is busy with the problems of bilateral regulation. The coincidence of the resulting distribution with a uniform one interests him to a much lesser extent than the coincidence or discrepancy of his data with the data of other researchers. He knows that people with right-foot dominance tend to circle counterclockwise, and people with left-foot dominance tend to circle clockwise, and that in a study by colleagues 13 left-foot dominance was found in 26 people out of 100 examined.

    Using the χ 2 method, he can compare two empirical distributions: a ratio of 51:19 in his own sample and a ratio of 74:26 in the sample of other researchers.

    This is an option comparison of two empirical distributions according to the simplest alternative criterion (of course, the simplest from a mathematical point of view, and by no means psychological).

    Similarly, we can compare distributions of choices from three or more alternatives. For example, if in a sample of 50 people 30 people chose answer (a), 15 people chose answer (b) and 5 people chose answer (c), then we can use the χ 2 method to check whether this distribution differs from a uniform distribution or from distribution of answers in another sample, where answer (a) was chosen by 10 people, answer (b) by 25 people, answer (c) by 15 people.

    In cases where a characteristic is measured quantitatively, say, V points, seconds or millimeters, we may have to combine the entire abundance of attribute values ​​into several digits. For example, if the time to solve a problem varies from 10 to 300 seconds, then we can enter 10 or 5 digits, depending on the sample size. For example, these will be the following categories: 0-50 seconds; 51-100 seconds; 101-150 seconds, etc. Then we use the χ 2 method will compare the frequencies of occurrence of different categories of the attribute, but otherwise the fundamental diagram does not change.

    By comparing the empirical distribution with the theoretical one, we determine the degree of discrepancy between the empirical and theoretical frequencies.

    By comparing two empirical distributions, we determine the degree of discrepancy between the empirical frequencies and the theoretical frequencies that would be observed if the two empirical distributions coincided. Formulas for calculating theoretical frequencies will be specifically given for each comparison option.

    The greater the discrepancy between two compared distributions, the more empirical value y).

    Hypotheses

    Several variants of hypotheses are possible, depending on the tasks,

    which we set for ourselves.

    First option:

    H 0: The resulting empirical distribution of the characteristic does not differ from the theoretical (for example, uniform) distribution.

    H 1: The obtained empirical distribution of the characteristic differs from the theoretical distribution.

    Second option:

    H 0: Empirical distribution 1 is no different from empirical distribution 2.

    H 1: Empirical distribution 1 is different from empirical distribution 2.

    Third option:

    H 0: Empirical distributions 1, 2, 3, ... do not differ from each other.

    H 1: Empirical distributions 1, 2, 3, ... differ from each other.

    The χ 2 criterion allows us to test all three hypotheses.

    Graphical representation of the criterion

    Let's illustrate an example with the choice of right or left tracks on the way from point A to point B. In Fig. 4.4, the frequency of selection of the left track is represented by the left column, and the frequency of selection of the right track is represented by the right column of the histogram 14. The relative frequencies of choice are measured on the y-axis, that is, the frequencies of choice of a particular track, related to the total number of observations. For the left track, the relative frequency, also called frequency, is 19/70, that is, 0.27, and for the right track, it is 51/70, that is, 0.73.

    If both paths were chosen with equal probability, then half of the subjects would choose the right path, and half would choose the left path. The probability of choosing each of the paths would be 0.50.

    We see that the deviations of empirical frequencies from this value are quite significant. It is possible that the differences between the empirical and theoretical distributions will be reliable.

    In Fig. Figure 4.5 actually presents two histograms, but the bars are grouped so that on the left the frequencies of preference for the left track in the choice of our observer (1) and in T.A.’s sample are compared. Dobrokhotova and N.N. Bragina (2), and on the right - the frequencies of preference for the right track in the same two samples.

    We see that the differences between samples are very small. χ2 criterion, will most likely confirm the coincidence of the two distributions.

    Limitations of the criterion

    1. The sample size must be large enough: n30. At n<30 критерий χ2 gives very approximate values. The accuracy of the criterion increases with large n.

    2. The theoretical frequency for each table cell should not be less than 5: f> 5. This means that if the number of digits is predetermined and cannot be changed, then we cannot apply the χ2 method without accumulating a certain minimum number of observations. If, for example, we want to test our assumptions that the frequency of calls to the Trust telephone service is unevenly distributed over 7 days of the week, then we will need 5 * 7 = 35 calls. Thus, if the number of digits ( k) preset, as in this case, the minimum number of observations ( n min) is determined by the formula: n min = k*5.

    3. The selected categories must “scoop out” the entire distribution, that is, cover the entire range of variability of characteristics. In this case, the grouping into categories must be the same in all compared distributions.

    4. It is necessary to make a “continuity correction” when comparing distributions of features that take only 2 values. When making a correction, the value of χ 2 decreases (see Example with continuity correction).

    5. The categories must be non-overlapping: if an observation is assigned to one category, then it can no longer be assigned to any other category.

    The sum of observations by rank must always be equal to the total number of observations.

    A legitimate question is what should be considered the number of observations - the number of choices, reactions, actions or the number of subjects who make a choice, exhibit reactions or perform actions. If a subject exhibits several reactions, and all of them are recorded, then the number of subjects will not match the number of reactions. We can sum up the reactions of each subject, as, for example, this is done in the Heckhausen method for studying achievement motivation or in the S. Rosenzweig Frustration Tolerance Test, and compare the distributions of individual sums of reactions in several samples.

    In this case, the number of observations will be the number of subjects. If we count the frequency of reactions of a certain type in the sample as a whole, we obtain a distribution of reactions of different types, and in this case the number of observations will be the total number of recorded reactions, and not the number of subjects.

    From a mathematical point of view, the rule of independence of digits is observed in both cases: one observation belongs to one and only one digit of the distribution.

    We can also imagine a variant of the study where we study the distribution of choices of one subject. In cognitive behavioral therapy, for example, the client is asked to record each time the exact time of occurrence of an undesirable reaction, for example, attacks of fear, depression, outbursts of anger, self-deprecating thoughts, etc. Subsequently, the psychotherapist analyzes the data obtained, identifying the hours during which unfavorable symptoms appear more often, and helps the client build an individual program to prevent adverse reactions.

    Is it possible using the χ2 criterion prove that some hours are more frequent in this individual distribution, and others are less frequent? All observations are dependent, since they relate to the same subject; at the same time, all discharges are non-overlapping, since one and the same attack refers to one and only one discharge (in this case, one o’clock in the afternoon). Apparently, the use of the χ2 method will be some simplification in this case. Attacks of fear, anger or depression may occur repeatedly during the day, and it may be that, say, early morning 6 o'clock and late evening 12 o'clock attacks usually occur together on the same day: on at the same time, a daytime 3-hour attack appears no earlier than one day after the previous attack and no less than two days before the next one, etc. Apparently, we are talking here about a complex mathematical model or something like that , which cannot be “believed by algebra.” Nevertheless, for practical purposes, it may be useful to use a criterion in order to identify systematic unevenness in the occurrence of any significant events, choices, preferences, etc. in the same person.

    So, the same observation should belong to only one category. But whether to consider each subject or each reaction of the subject as an observation is a question the solution of which depends on the goals of the study (see, for example, Ganzen V.A., Balin V.D., 1991, p. 10).

    The main “limitation” of the criterion χ 2 - that it seems frighteningly complex to most researchers.

    Let's try to overcome the myth of the incomprehensible difficulty of the criterion χ 2 . To enliven the presentation, consider a humorous literary example.

    Consider the application inMSEXCELPearson chi-square test for testing simple hypotheses.

    After obtaining experimental data (i.e. when there is some sample) usually the choice of distribution law is made that best describes the random variable represented by a given sampling. Checking how well the experimental data are described by the selected theoretical distribution law is carried out using agreement criteria. Null hypothesis, there is usually a hypothesis about the equality of the distribution of a random variable to some theoretical law.

    Let's look at the application first Pearson's goodness-of-fit test X 2 (chi-square) in relation to simple hypotheses (the parameters of the theoretical distribution are considered known). Then - , when only the shape of the distribution is specified, and the parameters of this distribution and the value statistics X 2 are assessed/calculated based on the same samples.

    Note: In English-language literature, the application procedure Pearson goodness-of-fit test X 2 has a name The chi-square goodness of fit test.

    Let us recall the procedure for testing hypotheses:

    • based on samples value is calculated statistics, which corresponds to the type of hypothesis being tested. For example, for used t-statistics(if not known);
    • subject to truth null hypothesis, the distribution of this statistics is known and can be used to calculate probabilities (for example, for t-statistics This );
    • calculated based on samples meaning statistics compared with the critical value for a given value ();
    • null hypothesis reject if value statistics greater than critical (or if the probability of getting this value statistics() less significance level, which is an equivalent approach).

    Let's carry out hypothesis testing for various distributions.

    Discrete case

    Suppose two people are playing dice. Each player has his own set of dice. Players take turns rolling 3 dice at once. Each round is won by the one who rolls the most sixes at a time. The results are recorded. After 100 rounds, one of the players suspected that his opponent’s dice were asymmetrical, because he often wins (he often throws sixes). He decided to analyze how likely such a number of enemy outcomes were.

    Note: Because There are 3 cubes, then you can roll 0 at a time; 1; 2 or 3 sixes, i.e. a random variable can take 4 values.

    From probability theory we know that if the dice are symmetrical, then the probability of getting sixes obeys. Therefore, after 100 rounds, the frequencies of sixes can be calculated using the formula
    =BINOM.DIST(A7,3,1/6,FALSE)*100

    The formula assumes that in the cell A7 contains the corresponding number of sixes rolled in one round.

    Note: Calculations are given in example file on the Discrete sheet.

    For comparison observed(Observed) and theoretical frequencies(Expected) convenient to use.

    If the observed frequencies deviate significantly from the theoretical distribution, null hypothesis about the distribution of a random variable according to a theoretical law should be rejected. That is, if the opponent's dice are asymmetrical, then the observed frequencies will be “significantly different” from binomial distribution.

    In our case, at first glance, the frequencies are quite close and without calculations it is difficult to draw an unambiguous conclusion. Applicable Pearson goodness-of-fit test X 2, so that instead of the subjective statement “substantially different”, which can be made based on comparison histograms, use a mathematically correct statement.

    We use the fact that due to law of large numbers observed frequency (Observed) with increasing volume samples n tends to the probability corresponding to the theoretical law (in our case, binomial law). In our case, the sample size n is 100.

    Let's introduce test statistics, which we denote by X 2:

    where O l is the observed frequency of events that the random variable has taken certain acceptable values, E l is the corresponding theoretical frequency (Expected). L is the number of values ​​that a random variable can take (in our case it is 4).

    As can be seen from the formula, this statistics is a measure of the proximity of observed frequencies to theoretical ones, i.e. it can be used to estimate the “distances” between these frequencies. If the sum of these “distances” is “too large,” then these frequencies are “significantly different.” It is clear that if our cube is symmetrical (i.e. applicable binomial law), then the probability that the sum of “distances” will be “too large” will be small. To calculate this probability we need to know the distribution statistics X 2 ( statistics X 2 calculated based on random samples, therefore it is a random variable and, therefore, has its own probability distribution).

    From the multidimensional analogue Moivre-Laplace integral theorem it is known that for n->∞ our random variable X 2 is asymptotically with L - 1 degrees of freedom.

    So if the calculated value statistics X 2 (the sum of the “distances” between frequencies) will be greater than a certain limiting value, then we will have reason to reject null hypothesis. Same as checking parametric hypotheses, the limit value is set via significance level. If the probability that the X2 statistic will take a value less than or equal to the calculated one ( p-meaning), will be less significance level, That null hypothesis can be rejected.

    In our case, the statistic value is 22.757. The probability that the X2 statistic will take a value greater than or equal to 22.757 is very small (0.000045) and can be calculated using the formulas
    =CHI2.DIST.PH(22.757,4-1) or
    =CHI2.TEST(Observed; Expected)

    Note: The CHI2.TEST() function is specifically designed to test the relationship between two categorical variables (see).

    Probability 0.000045 is significantly less than usual significance level 0.05. So, the player has every reason to suspect his opponent of dishonesty ( null hypothesis his honesty is denied).

    When using criterion X 2 it is necessary to ensure that the volume samples n was large enough, otherwise the distribution approximation would not be valid statistics X 2. It is usually believed that for this it is enough that the observed frequencies (Observed) be greater than 5. If this is not the case, then small frequencies are combined into one or added to other frequencies, and the combined value is assigned a total probability and, accordingly, the number of degrees of freedom is reduced X 2 distributions.

    In order to improve the quality of application criterion X 2(), it is necessary to reduce the partition intervals (increase L and, accordingly, increase the number degrees of freedom), however, this is prevented by the limitation on the number of observations included in each interval (db>5).

    Continuous case

    Pearson goodness-of-fit test X 2 can also be applied in case of .

    Let's consider a certain sample, consisting of 200 values. Null hypothesis states that sample made from .

    Note: Random variables in example file on the Continuous sheet generated using the formula =NORM.ST.INV(RAND()). Therefore, new values samples are generated each time the sheet is recalculated.

    Whether the existing data set is appropriate can be visually assessed.

    As can be seen from the diagram, the sample values ​​fit quite well along the straight line. However, as in for hypothesis testing applicable Pearson X 2 goodness-of-fit test.

    To do this, we divide the range of change of the random variable into intervals with a step of 0.5. Let us calculate the observed and theoretical frequencies. We calculate the observed frequencies using the FREQUENCY() function, and the theoretical ones using the NORM.ST.DIST() function.

    Note: Same as for discrete case, it is necessary to ensure that sample was quite large, and the interval included >5 values.

    Let's calculate the X2 statistic and compare it with the critical value for a given significance level(0.05). Because we divided the range of change of a random variable into 10 intervals, then the number of degrees of freedom is 9. The critical value can be calculated using the formula
    =CHI2.OBR.PH(0.05;9) or
    =CHI2.OBR(1-0.05;9)

    The chart above shows that the statistic value is 8.19, which is significantly higher critical valuenull hypothesis is not rejected.

    Below is where sample took on an unlikely significance and based on criterion Pearson consent X 2 the null hypothesis was rejected (even though the random values ​​were generated using the formula =NORM.ST.INV(RAND()), providing sample from standard normal distribution).

    Null hypothesis rejected, although visually the data is located quite close to a straight line.

    Let's also take as an example sample from U(-3; 3). In this case, even from the graph it is obvious that null hypothesis should be rejected.

    Criterion Pearson consent X 2 also confirms that null hypothesis should be rejected.

    The method discussed above works well if the qualitative sign that interests us takes two values ​​(there is thrombosis - no, the Martian is green - pink). Moreover, since the method is a direct analogue of the Student test, the number of samples being compared should also be equal to two.

    It is clear that both the number of attribute values ​​and the number of samples may turn out to be more than two. To analyze such cases, another method similar to analysis of variance is needed. In appearance, this method, which we will present now, is very different from the z criterion, but in fact there is much in common between them.

    In order not to go too far for an example, let’s start with the just discussed problem of shunt thrombosis. Now we will consider not the proportion, but the number of patients with thrombosis. Let's enter the test results in the table (Table 5.1). For each group we will indicate the number of patients with thrombosis and without thrombosis. We have two signs: the drug (aspirin-placebo) and thrombosis (yes-no); the table shows all their possible combinations, therefore such a table is called a contingency table. In this case, the table size is 2x2.

    Let's look at the cells located on a diagonal going from the upper left to the lower right corner. The numbers in them are noticeably larger than the numbers in other cells of the table. This suggests an association between aspirin use and the risk of thrombosis.

    Now let's look at the table. 5.2. This is a table of the expected numbers we would get if aspirin had no effect on the risk of thrombosis. We will discuss how to calculate the expected numbers a little lower, but for now let’s pay attention to the external features of the table. In addition to the slightly scary fractional numbers in the cells, you can notice another difference from the table. 5.1 is the summary data for groups in the right column and for thrombosis in the bottom line. In the lower right corner is the total number of patients in the trial. About-



    Please note that although the numbers in the cells in Fig. 5.1 and 5.2 are different, the sums in rows and columns are the same.

    How do you calculate the expected numbers? 25 people received placebo, aspirin - 19. Shunt thrombosis occurred in 24 out of 44 examined, that is, in 54.55% of cases it did not occur - in 20 out of 44, that is, in 45.45% of cases. Let us accept the null hypothesis that aspirin has no effect on the risk of thrombosis. Then thrombosis should be observed with an equal frequency of 54.55% in the placebo and aspirin groups. Calculating how much 54.55% of 25 and 19 is, we get 13.64 and 10.36, respectively. These are the expected numbers of patients with thrombosis in the placebo and aspirin groups. In the same way, one can obtain the expected number of patients without thrombosis in the placebo group - 45.45% of 25, that is, 11.36; in the aspirin group - 45.45% of 19, that is, 8.64. Please note that the expected numbers are calculated to the second decimal place - this precision will be needed in further calculations.

    Let's compare the table. 5.1 and 5.2. The numbers in the cells vary quite a bit. Therefore, the actual picture differs from that which would have been observed if aspirin had no effect on the risk of thrombosis. Now all that remains is to construct a criterion that would characterize these differences with one number, and then find its critical value - that is, do the same as in the case of the F, t or z criteria.

    However, first let us remember another principle already familiar to us -




    Mer - Conahan's work comparing halothane and morphine, namely the part where operative mortality was compared. The corresponding data are given in table. 5.3. The form of the table is the same as the table. 5.1. In turn, table 5.4 similar to table. 5.2 contains expected numbers, that is, numbers calculated under the assumption that lethality is independent of the anesthetic agent. Of all 128 operated on, 110 remained alive, that is, 85.94%. If the choice of anesthesia had no effect on mortality, then in both groups the proportion of survivors would be the same and the number of survivors would be in the halothane group - 85.94% of 61, that is, 52.42 in the morphine group - 85.94% of 67 , that is 57.58. The expected number of deaths can be obtained in the same way. Let's compare tables 5.3 and 5.4. Unlike the previous example, the differences between the expected and observed values ​​are very small. As we found out earlier, there is no difference in mortality. Looks like we're on the right track.

    x2 criteria for a 2x2 table

    The x2 test (read “chi-square”) does not require any assumptions about the parameters of the population from which the samples are drawn - this is the first of the non-parametric tests with which we are introduced. Let's start building it. First, as always, the criterion must give one number,


    which would serve as a measure of the difference between the observed data and the expected ones, that is, in this case, the difference between the table of observed and expected numbers. Secondly, the criterion must take into account that a difference of, say, one patient is more significant when the expected number is small than when the expected number is large.

    Let us define the x2 criterion as follows:

    where O is the observed number in a cell of the contingency table, E is the expected number in the same cell. The summation is carried out over all cells of the table. As can be seen from the formula, the greater the difference between the observed and expected numbers, the greater the contribution the cell makes to the %2 value. In this case, cells with a small expected number make a larger contribution. Thus, the criterion satisfies both requirements - firstly, it measures the differences and, secondly, it takes into account their magnitude relative to the expected numbers.

    Let us apply the x2 criteria to the data on shunt thrombosis. In table 5.1 shows the observed numbers, and table. 5.2 - expected.


    The z value obtained from the same data was also similar. It can be shown that for contingency tables of size 2x2 the equality X2 = z2 holds.

    The critical value %2 can be found in a way that is well known to us. In Fig. Figure 5.7 shows the distribution of possible values ​​of X2 for contingency tables of size 2x2 for the case when there is no connection between the characteristics being studied. The value of X2 exceeds 3.84 only in 5% of cases. Thus, 3.84 is the critical value for the 5% significance level. In the shunt thrombosis example, we obtained a value of 7.10, so we reject the hypothesis that there is no association between aspirin use and blood clots. On the contrary, the data from Table. 5.3 are in good agreement with the hypothesis that halothane and morphine have the same effect on postoperative mortality.

    Of course, like all significance criteria, x2 gives a probabilistic assessment of the truth of a particular hypothesis. In fact, aspirin may have no effect on the risk of thrombosis. In fact, halothane and morphine may have different effects on operative mortality. But, as the criterion showed, both are unlikely.

    The application of the x2 criterion is legal if the expected number in any of the cells is greater than or equal to 5. This condition is similar to the condition for the applicability of the z criterion.

    The critical value %2 depends on the size of the contingency table, that is, on the number of treatments being compared (table rows) and the number of possible outcomes (table columns). The size of the table is expressed by the number of degrees of freedom v:

    V = (r - 1)(s - 1),

    where r is the number of rows, and c is the number of columns. For tables of size 2x2 we have v = (2 - l)(2 - l) = l. The critical values ​​of %2 for different v are given in table. 5.7.

    The previously given formula for x2 in the case of a 2x2 table (that is, with 1 degree of freedom) gives slightly inflated values ​​(a similar situation was with the z criterion). This is because the theoretical distribution of x2 is continuous, whereas the set of calculated x2 values ​​is discrete. In practice, this will result in the null hypothesis being rejected too often. To compensate for this effect, the Yeats correction is introduced into the formula: (1 O - E - -

    Note that the Yeats correction only applies when v = 1, that is, for 2x2 tables.

    Let's apply the Yeats correction to study the relationship between taking aspirin and shunt thrombosis (Tables 5.1 and 5.2):


    As you remember, without the Yates correction, the %2 value was 7.10. The corrected %2 value was less than 6.635, the critical value for the 1% significance level, but still exceeded 5.024, the critical value for the 2.5% significance level.

    X2 criterion for an arbitrary contingency table

    Now consider the case when the contingency table has more rows or columns than two. Note that the z test is not applicable in such cases.

    In ch. 3 we showed that running reduces the number of menstruation*. Do these changes prompt you to see a doctor? In table Table 5.5 shows the results of a survey of study participants. Do these data support the hypothesis that running does not affect the likelihood of seeing a doctor for menstrual irregularity?

    Of the 165 women examined, 69 (i.e. 42%) consulted a doctor, the remaining 96 (i.e. 58%) did not consult a doctor. If

    * At the same time, for simplicity of calculations, we assumed the sizes of all three groups - control, female athletes and female athletes - to be the same. Now we will use real data.


    jogging does not affect the likelihood of seeing a doctor, then in each of the groups 42% of women should have seen a doctor. In table 5.6 shows the corresponding expected values. Are the real data very different from them?

    To answer this question, let's calculate %2:

    (14 - 22,58)2 (40 - 31,42)2 (9 - 9,62)2

    22,58 31,42 9,62

    (14 - 13,38)2 (46 - 36,80)2 (42 - 51,20)2

    13,38 36,80 51,20

    The number of rows in the contingency table is three, the columns are two, so the number of degrees of freedom is v = (3 - 1)(2 - 1) = 2. If the hypothesis about the absence of intergroup differences is correct, then, as can be seen from the table. 5.7 the value of %2 will exceed 9.21 in no more than 1% of cases. The resulting value is greater. Thus, at a significance level of 0.01, we can reject the hypothesis that there is no connection between running and visits to the doctor about menstruation. However, having found out that a connection exists, we will nevertheless not be able to indicate which (which) groups differ from the rest.

    So, we got acquainted with the %2 criterion. Here is the procedure for using it.

    Construct a contingency table based on the available data.

    Count the number of objects in each row and in each column and find what proportion of the total number of objects these values ​​make up.

    Knowing these shares, calculate the expected numbers to within two decimal places - the number of objects that
    would fall into every cell of the table if there were no relationship between rows and columns

    Find the value that characterizes the differences between the observed and expected values. If the contingency table is 2x2, apply the Yeats correction

    Calculate the number of degrees of freedom, select the significance level and according to the table. 5.7, determine the critical value %2. Compare it with what you got for your table.

    As you remember, for contingency tables of size 2x2, the x2 criterion is applicable only in the case when all expected numbers are greater than 5. What is the situation with tables of larger sizes? In this case, the %2 criterion is applicable if all expected numbers are not less than 1 and the proportion of cells with expected numbers less than 5 does not exceed 20%. If these conditions are not met, x2 criteria may give false results. In this case, it is possible to collect additional data, but this is not always feasible. There is an easier way - to combine several rows or columns. Below we will show you how to do this.

    Conversion of contingency tables

    In the previous section, we established the existence of a connection between running and visits to the doctor about menstruation, or, what is the same, the existence of differences between groups in the frequency of visits to the doctor. However, we could not determine which groups differed from each other and which did not. We encountered a similar situation in analysis of variance. When comparing several groups, analysis of variance allows you to detect the very fact of the existence of differences, but does not indicate which groups stand out. The latter can be done using multiple comparison procedures, which we discussed in Chapter. 4. Something similar can be done with contingency tables.

    Looking at the table. 5.5, it can be assumed that female athletes and sportswomen consulted a doctor more often than women from the control group. The difference between female athletes and female athletes seems insignificant.

    Let's test the hypothesis that female athletes and athletes

    V 0,50 0,25 0,10 0,05 0,025 0,01 0,005 0,001
    41 40,335 46,692 52,949 56,942 60,561 64,950 68,053 74,745
    42 41,335 47,766 54,090 58,124 61,777 66,206 69,336 76,084
    43 42,335 48,840 55,230 59,304 62,990 67,459 70,616 77,419
    44 43,335 49,913 56,369 60,481 64,201 68,710 71,893 78,750
    45 44,335 50,985 57,505 61,656 65,410 69,957 73,166 80,077
    46 45,335 52,056 58,641 62,830 66,617 71,201 74,437 81,400
    47 46,335 53,127 59,774 64,001 67,821 72,443 75,704 82,720
    48 47,335 54,196 60,907 65,171 69,023 73,683 76,969 84,037
    49 48,335 55,265 62,038 66,339 70,222 74,919 78,231 85,351
    50 49,335 56,334 63,167 67,505 71,420 76,154 79,490 86,661
    Significance level

    J. H. Zar, Biostatistical Analysis, 2d ed, Prentice-Hall, Englewood Cliffs, N.J., 1984.

    They visit the doctor equally often. To do this, select a subtable from the original table containing data for these two groups. In table 5.8 shows the observed and expected numbers; they're pretty close.

    Lecture 6. Analysis of two samples

    6.1 Parametric criteria. 1

    6.1.2 Student's t test ( t-test) 2

    6.1.3 F - Fisher criterion. 6

    6.2 Nonparametric tests. 7

    6.2.1 Sign criterion ( G-criterion) 7

    The next task of statistical analysis, solved after determining the main (sample) characteristics and analyzing one sample, is the joint analysis of several samples. The most important question that arises when analyzing two samples is whether there are differences between the samples. Usually, this is done by testing statistical hypotheses about the belonging of both samples to the same general population or about the equality of means.

    If the type of distribution or distribution function of the sample is given to us, then in this case the problem of assessing the differences between two groups of independent observations can be solved using parametric criteria statistics: either Student's test ( t ), if the samples are compared using average values ​​( X and U), or using Fisher's criterion ( F ), if samples are compared based on their variances.

    Using parametric statistical criteria without first checking the type of distribution can lead to certain errorsduring testing of the working hypothesis.

    To overcome these difficulties in the practice of pedagogical research, one should use nonparametric criteria statistics , such as the sign test, two-sample Wilcoxon test, Van der Waerden test, Spearman test, the choice of which, although it does not require a large number of sample members and knowledge, the type of distribution, still depends on a number of conditions.

    Nonparametric statistics tests - are free from the assumption of the distribution law of samples and are based on the assumption of independence of observations.

    6.1 Parametric criteria

    To the group parametric criteria methods of mathematical statistics includes methods for calculating descriptive statistics, plotting graphs for normality of distribution, testing hypotheses about the belonging of two samples to the same population. These methods are based on the assumption that the sample distribution follows a normal (Gaussian) distribution law. Among the parametric statistics criteria, we will consider the Student and Fisher tests.

    6.1.1 Methods for testing samples for normality

    To determine whether we are dealing with a normal distribution, the following methods can be used:

    1) within the axes you can draw a frequency polygon (empirical distribution function) and bell curve based on research data. By examining the shapes of the normal distribution curve and the graph of the empirical distribution function, one can find out those parameters by which the latter curve differs from the first;

    2) calculated mean, median and mode, and on the basis of this the deviation from the normal distribution is determined. If the mode, median and arithmetic mean do not differ significantly from each other, we are dealing with a normal distribution. If the median differs significantly from the mean, then we are dealing with an asymmetric sample.

    3) the kurtosis of the distribution curve must be equal to 0. Curves with positive kurtosis are significantly more vertical than the normal distribution curve. Curves with negative kurtosis are more sloping than a normal distribution curve;

    4) after determining the average value of the frequency distribution and standard deviation, find the following four distribution intervals and compare them with the actual data of the series:

    a) - the interval should include about 25% of the population frequency,

    b) - the interval should include about 50% of the population frequency,

    c) - the interval should include about 75% of the population frequency,

    d) - the interval should include about 100% of the population frequency.

    6.1.2 Student's t test ( t-test)

    The test allows you to find the probability that both means in the sample belong to the same population. This criterion is most often used to test the hypothesis: “The means of two samples belong to the same population.”

    When using the criterion, two cases can be distinguished. In the first case, it is used to test the hypothesis about the equality of the general means of two independent, unrelated samples (so-called two-sample t-test). In this case, there is a control group and an experimental (experimental) group; the number of subjects in the groups may be different.

    In the second case, when the same group of objects generates numerical material to test hypotheses about averages, the so-called paired t-test. The samples are called dependent, related.

    a) case of independent samples

    The test statistic for the case of unrelated, independent samples is:

    where , are arithmetic averages in the experimental and control groups,

    Standard error of the difference between arithmetic means. Found from the formula:

    ,(2)

    where n 1 and n 2 the values ​​of the first and second samples, respectively.

    If n 1 =n 2, then the standard error of the difference between arithmetic means will be calculated according to the formula:

    (3)

    where n is the sample size.

    Count number of degrees of freedom carried out according to the formula:

    k = n 1 + n 2 – 2.(4)

    If the samples are numerically equal, k = 2 n - 2.

    Next, you need to compare the obtained t em value with the theoretical value of the Student t-distribution (see the appendix to statistics textbooks). If t em

    Let's look at an example of use t -Student's t-test for unconnected and unequally sized samples.

    Example 1. In two groups of students - experimental and control - the following results were obtained in the academic subject (test scores; see Table 1).

    Table 1. Experiment results

    First group (experimental) N 1 =11 people

    Second group (control)

    N 2 =9 people

    121413161191315151814

    Total number of sample members: n 1 =11, n 2 =9.

    Calculation of arithmetic averages: X av =13.636; Y av =9.444

    Standard deviation: s x =2.460; s y =2.186

    Using formula (2), we calculate the standard error of the difference between arithmetic means:

    We calculate the statistics of the criterion:

    We compare the t value obtained in the experiment with the table value, taking into account the degrees of freedom equal, according to formula (4), to the number of subjects minus two (18).

    The tabulated value of t crit is equal to 2.1, assuming the risk of making an erroneous judgment in five cases out of a hundred (significance level = 5% or 0.05).

    If the empirical t value obtained in the experiment exceeds the tabulated one, then there is reason to accept the alternative hypothesis (H 1) that students in the experimental group show, on average, a higher level of knowledge. In the experiment t=3.981, table t=2.10, 3.981>2.10, which leads to the conclusion about the advantage of experimental learning.

    Here there may be such questions :

    1. What if the t value obtained in the experiment turns out to be less than the tabulated one? Then we must accept the null hypothesis.

    2. Has the advantage of the experimental method been proven? It is not so much proven as it is shown, because from the very beginning there is a risk of being mistaken in five cases out of a hundred (p = 0.05). Our experiment could be one of these five cases. But 95% of possible cases speak in favor of the alternative hypothesis, and this is a fairly convincing argument in statistical proof.

    3. What if the control group performs better than the experimental group? For example, let's change places, making the arithmetic mean of the experimental group, a - the control one:

    From this it follows that the new method has not yet proven itself to be good, for various reasons, perhaps. Since the absolute value is 3.9811>2.1, the second alternative hypothesis (H 2) about the advantage of the traditional method is accepted.

    b) case of related (paired) samples

    In the case of related samples with an equal number of measurements in each, you can use the simpler Student's t-test formula.

    The t value is calculated using the formula:

    where are the differences between the corresponding values ​​of the variable X and the variable Y, and d is the average of these differences;

    Sd is calculated using the following formula:

    (6)

    Number of degrees of freedom k determined by the formula k=n -1. Let's consider an example of using the Student's t-test for connected and, obviously, equal in number of samples.

    If t em

    Example 2. The level of students' orientation towards artistic and aesthetic values ​​was studied. In order to intensify the formation of this orientation, conversations were held in the experimental group, exhibitions of children's drawings were held, visits to museums and art galleries were organized, meetings were held with musicians, artists, etc. The question naturally arises: what is the effectiveness of the work done? In order to check the effectiveness of this work, a test was given before and after the experiment. For methodological reasons, Table 2 shows the results of a small number of subjects.

    Table 2. Experimental results

    Students

    (n =10)

    Points

    Auxiliary calculations

    before the start of the experiment (X)

    at the end

    experiment (U)

    d

    d 2

    Ivanov

    Novikov

    Sidorov

    Pirogov

    Agapov

    Suvorov

    Ryzhikov

    Serov

    Toporov

    Bystrov

    Average

    14,8

    21,1

    First, let's calculate using the formula:

    Then we apply formula (6), we get:

    And finally, formula (5) should be applied. We get:

    Number of degrees of freedom: k =10-1=9 and according to the table in Appendix 1 we find t crit =2.262, experimental t=6.678, which implies the possibility of accepting an alternative hypothesis (H 1) about significant differences in arithmetic means, i.e., a conclusion is made about effectiveness of experimental influence.

    In terms of statistical hypotheses, the result obtained will sound like this: at the 5% level, the hypothesis H 0 is rejected and the hypothesis H 1 is accepted.

    6.1.3 F - Fisher test

    Fisher criterion allows you to compare the sample variances of two independent samples. To calculate F emp, you need to find the ratio of the variances of the two samples, so that the larger variance is in the numerator, and the smaller one is in the denominator. The formula for calculating the Fisher criterion is:

    where are the variances of the first and second samples, respectively.

    Since, according to the conditions of the criterion, the value of the numerator must be greater than or equal to the value of the denominator, the value of F emp will always be greater than or equal to one.

    The number of degrees of freedom is also determined simply:

    k 1 =n l - 1 for the first sample (i.e. for the sample whose variance is larger) and k 2 =n 2 - 1 for the second sample.

    In Appendix 1, the critical values ​​of the Fisher criterion are found by the values ​​of k 1 (top line of the table) and k 2 (left column of the table).

    If t em >t crit, then the null hypothesis is accepted, otherwise the alternative is accepted.

    Example 3. In two third grades, ten students were tested for mental development using the TURMSH test. The obtained average values ​​did not differ significantly, but the psychologist is interested in the question of whether there are differences in the degree of homogeneity of mental development indicators between classes.

    Solution. For Fisher's test, it is necessary to compare the variances of test scores in both classes. The test results are presented in the table:

    Table 3.

    Student nos.

    First class

    Second class

    Amounts

    Average

    60,6

    63,6

    Having calculated the variances for variables X and Y, we obtain:

    s x 2 =572.83; s y 2 =174.04

    Then, using formula (8) for calculation using Fisher’s F criterion, we find:

    According to the table from Appendix 1 for the F criterion with degrees of freedom in both cases equal to k = 10 - 1 = 9, we find F crit = 3.18 (<3.29), следовательно, в терминах статистических гипотез можно утвер­ждать, что Н 0 (гипотеза о сходстве) может быть отвергнута на уровне 5%, а принимается в этом случае гипотеза Н 1 . Иc следователь может утверждать, что по степени однородности такого показа­теля, как умственное развитие, имеется различие между выбор­ками из двух классов.

    6.2 Nonparametric tests

    By comparing by eye (by percentage) the results before and after any impact, the researcher comes to the conclusion that if differences are observed, then there is a difference in the samples being compared. This approach is categorically unacceptable, since for percentages it is impossible to determine the level of reliability in the differences. Percentages taken by themselves do not make it possible to draw statistically reliable conclusions. To prove the effectiveness of any intervention, it is necessary to identify a statistically significant trend in the bias (shift) of indicators. To solve such problems, the researcher can use a number of difference criteria. Nonparametric criteria will be considered below: the sign test and the chi-square test.

    6.2.1 Sign criterion ( G-test)

    The criterion is intended to compare the state of some property among members of two dependent samples based on measurements made on a scale not lower than the ranking one.

    There are two series of observations on random variables X and U, obtained by considering two dependent samples. Based on them, N pairs of the form (x i, y i), where X i, y i - the results of twice measuring the same property for the same object.

    In pedagogical research, the objects of study can be students, teachers, and school administrators. At the same time x i, y i can be, for example, points assigned by a teacher for performing the same or different work twice by the same group of students before and after using some pedagogical means.

    Elements of each pair x i, y i are compared to each other in magnitude, and the pair is assigned a sign «+» , if x i< у i , sign «-» , if x i > y i And «0» , if x i = y i .

    Null hypothesis are formulated as follows: in the state of the property being studied there are no significant differences in the primary and secondary measurements. Alternative hypothesis: laws of distribution of quantities X and V are different, that is, the states of the property being studied are significantly different in the same population during the primary and secondary measurements of this property.

    Criterion statistics (T) is defined as follows:

    Let us assume that out of N pairs (x, y,) there were several pairs in which the values x i and y i are equal. Such pairs are designated by the sign “0” and are not taken into account when calculating the value of T. Let us assume that after subtracting from the number N the number of pairs indicated by the sign “0”, only n steam. Among the remaining n pairs, we count the number of pairs indicated by the sign “-”, that is, pairs in which x i< y i . The value of T and is equal to the number of pairs with a minus sign.

    The null hypothesis is accepted atsignificance level of 0.05 if the observed value T< n - t a , где значение n - t a determined from statistical tables for the sign criterion of Appendix 2.

    Example 4.Students completed a test aimed at testing their understanding of a certain concept. Fifteen students were then given an e-learning guide designed to develop the concept among students with low learning disabilities. After studying the manual, students again completed the same test, which was graded on a five-point system.

    The results of performing the work twice are measured on an order scale (five-point scale). Under these conditions, it is possible to use the sign criterion to identify trends in changes in the state of students’ knowledge after studying the manual, since all the assumptions of this criterion are met.

    We will write down the results of completing the work twice (in points) by 15 students in table form (see Table 1).

    Table 4.

    Students (No.)

    First execution

    Second execution

    Elevation difference sign

    Hypothesis being tested H 0 : The students’ knowledge did not improve after studying the manual. Alternative hypothesis: students' knowledge increased after studying the manual.

    Let's calculate the value of the T criterion statistic equal to the number of positive differences in grades received by students. According to the data in Table. 4 T=10, n=12.

    To determine the critical values ​​of the n-ta criterion statistics, we use the table. Appendix 2. For the significance level a = 0.05 at n =12 value n-ta=9. Therefore, the inequality T> n-ta (10>9) is satisfied. Therefore, in accordance with the decision rule, the null hypothesis is rejected at a significance level of 0.05 and the alternative hypothesis is accepted, which allows us to conclude that students’ knowledge has improved after studying the manual independently.

    Example 5.It is assumed that studying a mathematics course contributes to the formation in students of one of the techniques of logical thinking (for example, the technique of generalization) even if its formation is not carried out purposefully. To test this assumption, the following experiment was carried out.

    Students VII class, 5 problems were proposed, the solution of which was based on the use of this thinking technique. A student was considered to have mastered this technique if he gave the correct answer to 3 or more problems.

    The following measurement scale was developed: 1 or 2 problems solved correctly - score “0”; 3 problems solved correctly - score “1”; 4 problems solved correctly - score “2”; 5 problems solved correctly - score “3”.

    The work was carried out twice: at the end of September and the end of May of the following year. It was written by 35 of the same students, randomly selected from 7 different schools. We will write down the results of performing the work twice in the form of a table (see Table 5).

    In accordance with the goals of the experiment, we formulate the null hypothesis as follows: H 0 - studying mathematics does not contribute to the formation of the studied method of thinking. Then the alternative hypothesis will look like: H 1 - studying mathematics contributes to the mastery of this method of thinking.

    Table 5.

    According to the data in Table. 5, the value of statistics T=15 is the number of differences with the “+” sign. Of the 35 pairs, 12 have a “0” sign; Means, n = 35-12 = 23.

    According to the table in Appendix 2 for n =23 and a significance level of 0.025, we find the critical value of the test statistic equal to 16. Therefore, the inequality T is true

    Therefore, in accordance with the decision rule, we have to conclude that the results obtained do not provide sufficient grounds for rejecting the null hypothesis, i.e., we do not have sufficient grounds for rejecting the statement that the study of mathematics in itself does not contribute to the mastery of the selected subject. method of thinking.

    6.2.2 χ2 test (chi-square)

    The χ 2 (chi-square) test is used to compare the distributions of objects in two populations based on measurements on a scale of names in two independent samples.

    Let us assume that the state of the property being studied (for example, the performance of a certain task) is measured for each object on a naming scale that has only two mutually exclusive categories (for example: done correctly - done incorrectly). Based on the results of measuring the state of the property under study for objects in two samples, a four-cell 2X2 table is compiled. (see Table 6).

    Table 6.

    In this table ABOUT ij- number of objects inith sample included inj-th category according to the state of the property being studied;i =1.2– number of samples;j =1.2– number of categories;; N- total number of observations equal to O 11 + O 12 + O 21 + O 22 or n 1 + n 2 .

    Then, based on the data in table 2X2 (see Table 6), it is possible to test the null hypothesis about the equality of the probabilities of objects of the first and second sets falling into the first (second) category of the measurement scale of the property being tested, for example, the hypothesis about the equality of the probabilities of correct completion of a certain task by control and experimental students classes.

    When testing null hypotheses, it is not necessary that the probability values p 1 And p 2 were known, since hypotheses only establish certain relationships between them (equality, more or less).

    To test the null hypotheses discussed above, according to the data in the 2X2 table (see Table 6), the value of the criterion statistic is calculated T according to the following general formula:

    (9)

    where n 1, n 2 - sample volumes,N=n 1 + n 2- total number of observations.

    The hypothesis is being tested H0: p 1 £ p 2- with an alternative H 1: p 1 >p 2. Let a - accepted level of significance. Then the value of the statistic T, obtained on the basis of experimental data is compared with the critical value of statistics x 1-2 a,which is determined from the table s 2 s one degree of freedom (see Appendix 2) taking into account the selected value a . If the inequality is true T< x 1-2 a , then the null hypothesis is accepted at the level a .If this inequality is not satisfied, then we do not have sufficient grounds to reject the null hypothesis.

    Due to the fact that replacing the exact distribution of statistics T distribution s 2 s one degree of freedom gives a fairly good approximation only for large samples; the use of the criterion is limited by certain conditions.

    1) the sum of the volumes of two samples is less than 20;

    2)at least one of the absolute frequencies in the 2X2 table, compiled on the basis of experimental data, is less than 5.

    Example 6.An experiment was conducted aimed at identifying the best of the textbooks written by two teams of authors in accordance with the goals of teaching geometry and the content of the program IX class. To conduct the experiment, two districts were selected by random selection, most of whose schools were classified as rural by location. Students of the first district (20 classes) studied using textbook No. 1, students of the second district (15 classes) studied using textbook No. 2.

    Let's consider the methodology for comparing the answers of teachers in experimental schools in two districts to one of the survey questions: “Is the textbook as a whole accessible for independent reading and does it help you learn material that the teacher did not explain in class (Answer: yes - no.)

    The attitude of teachers towards the studied property of textbooks is measured on a scale of names, which has two categories: yes, no. Both samples of teachers are random and independent.

    We will divide the answers of 20 teachers of the first district and 15 teachers of the second district into two categories and write them down in the form of a 2X2 table (Table 5).

    Table 7.

    All values ​​in table. 7 is not less than 5, therefore, in accordance with the conditions for using the criterion c 2 The criterion statistics are calculated using formula (9).

    According to the table from Appendix 2 for one degree of freedom ( v = l ) and significance level a =0.05 we will find x 1- a a=T critical = 3.84. Hence the inequality T observation is true<Т критич (1,86<3,84). Согласно правилу принятия ре­шений для критерия c 2 , the obtained result does not provide sufficient grounds for rejecting the null hypothesis, i.e. the results of the survey of teachers in two experimental districts do not provide sufficient grounds for rejecting the assumption of equal availability of textbooks 1 and 2 for students to read independently.

    The use of the chi-square test is also possible in the case when the objects of two samples from two populations, according to the state of the property being studied, are distributed into more than two categories. For example, students in experimental and control classes are divided into four categories in accordance with the marks (in points: 2, 3, 4, 5) received by the students for completing some test work.

    The results of measuring the state of the property being studied for objects in each sample are distributed into WITH categories. Based on these data, table 2ХС is compiled, in which there are two rows (according to the number of populations under consideration) and WITH columns (according to the number of different categories of state of the property being studied, adopted in the study).

    Table 8.

    Based on the data in Table 8, you can test the null hypothesis about the equality of the probabilities of objects of the first and second sets falling into each ofi (i = l,2, ..., C) categories, i.e. check the fulfillment of all the following equalities: p 11 = p 21, p 12 = p 22, …, p 1 c = p 2 c. It is possible, for example, to test the hypothesis about the equality of the probabilities of receiving grades “5”, “4”, “3” and “2” for the completion of a certain task by students in control and experimental classes.

    To test the null hypothesis using the criterion c 2 Based on the data in Table 2ХС, the value of the criterion statistics is calculated T according to the following formula:

    (10)

    Where n 1 And n 2- sample sizes.

    Meaning T, obtained on the basis of experimental data is compared with the critical value x 1- a,which is determined from the table c 2 c k =With-1 degrees of freedom taking into account the selected level of significance a . When the inequality holds T> x 1- a athe null hypothesis is rejected at A and the alternative hypothesis is accepted. This means that the distribution of objects on WITH categories according to the state of the property being studied are different in the two populations under consideration.

    Example 7. Let's consider a methodology for comparing the results of written work that tested the mastery of one of the sections of the course by students of the first and second regions.

    Using a random selection method, a sample of 50 people was drawn up from the students of the first district who wrote the work, and a sample of 50 people from the students of the second district. In accordance with specially developed criteria for assessing the performance of work, each student could fall into one of four categories: bad, mediocre, good, excellent. We use the results of the work performed by two samples of students to test the hypothesis that textbook No. 1 promotes better mastery of the tested section of the course, i.e., students in the first experimental district will, on average, receive higher grades than students in the second district.

    We will write the results of the work performed by students of both samples in the form of a 2X4 table (Table. 9 ).

    Table 9.

    In accordance with the terms of use of the criterion c 2 The criterion statistics are calculated using the adjusted formula (10).

    In accordance with the conditions for applying the two-sided chi-square test according to the table from Appendix 2 for one degree of freedom ( k Grabar M.I., Krasnyanskaya K.A. Application of mathematical statistics in educational research. Nonparametric methods. M., “Pedagogy”, 1977, p. 54

    Grabar M.I., Krasnyanskaya K.A. Application of mathematical statistics in educational research. Nonparametric methods. M., “Pedagogy”, 1977, p. 57