To test
, that there is no relationship between the row and
column classifications, a statistic called the chi-square
statistic/ is used. This statistic compares the sample counts with
their expected values. Specifically, we take the difference between
the sample count and its expected count, square these values, and
divide by the expected count, then sum over all entries. That is, to
compare the sample and expected counts we use a statistic
,
called the chi-square statistic. It is calculated from the following
formula:
where observed/ represents the sample counts, and
expected/ represents the expected counts, and the sum is over all
entries in the sample or expected count tables.
To test
, we need a distribution to compare
to, under the
assumptions that
is true. This leads us to
the chi-squared distribution.
The
distribution is described by a single parameter, its
degrees of freedom. Furthermore, the
distribution is skewed
to the right.
The data for an
table can be obtained by random sampling
as described by either of the two models previously discussed.
The null hypothesis to be tested is that the row and column classifications are independent (first model) or that the row classification proportions for the c populations are all equal (second model). The alternative hypothesis is that the null hypothesis is not true.
The test statistic is the
statistic
If
is true, the statistic
has approximately a
distribution with (r - 1)(c - 1) degrees of freedom.
The p-value for the test is
where
is a
random variable having the
distribution. The
approximation is based on having a large sample. The sample is judged
large enough if the average of the expected counts is 5 or more, and
the smallest expected count is 1 or more.