next up previous contents
Next: Inference for Two-Way Tables Up: Categorical Data Analysis Previous: One Categorical Variable

Goodness of Fit Testing

An example of the one category situation is when we have a random sample tex2html_wrap_inline2631 of size n from a continuous population and we want to test whether the population has a particular distribution (such as normal). In this case we could divide the range of the data into K intervals and count how many of the X's fall into each interval. For example, if the hypothesis is that the X's come from a uniform distribution on the interval [0,1], we could divide [0,1] into the 10 intervals [0,.1), [.1,.2), and so on and then count how many X's are in each interval. Again we call the observed counts tex2html_wrap_inline5423 .

From the hypothesized distribution, we can calculate how many X's should be in each interval (for the uniform example, 10% of the X's should fall in each interval). Again we have tex2html_wrap_inline5401 where tex2html_wrap_inline5397 is the probability that an X falls in the ith interval.

In some cases, we need to estimate the parameters of the hypothesized distribution. In testing for normality for example, in order to find the tex2html_wrap_inline5397 's, we need to know the mean and variance of the population. If we use tex2html_wrap_inline2643 and tex2html_wrap_inline2669 as estimates of the true mean and variance, then we must further reduce the degrees of freedom of the tex2html_wrap_inline3701 statistic by 2 (one for each estimated parameter).

EXAMPLE:\ To see if there is a seasonal effect for homicide, 1361 crimes were classified into the four seasons, where 334 of them happened in spring, 372 in summer, 327 in Fall and 328 in winter. Do we have enough evidence to show that the crime frequencies are different for different seasons? Let tex2html_wrap_inline5397 , tex2html_wrap_inline5447 be the proportions of crimes for the four seasons, respectively.

  1. tex2html_wrap_inline5449 ; tex2html_wrap_inline5451 at least one inequality exists.
  2. tex2html_wrap_inline5453 = 1361*.25 = 340.25, tex2html_wrap_inline5447 .
  3. Chi-squared est statistics = 4.034 with d.f.=4-1=3.
  4. The rejection region is tex2html_wrap_inline5459 .
  5. Fail to reject tex2html_wrap_inline4309 and conclude that there is not enough evidence to show that there is a seasonal effect on the crime rate.

next up previous contents
Next: Inference for Two-Way Tables Up: Categorical Data Analysis Previous: One Categorical Variable

Jan Lethen
Wed Nov 13 16:20:46 CST 1996