next up previous contents
Next: Computer Lab Up: Inferences for Simple Linear Previous: Inferences for Population Correlation

Residual Plots and Regression Assumptions

Recall that there are three basic assumptions about the random deviations (errors), tex2html_wrap_inline5033 : the random deviations are independent, normally distributed, and have a constant variance. In simple linear regression, we also assume that Y and X are linearly related. We shall consider the use of residual plots for examining the following types of departures from the assumed model.

  1. The regression function is not linear.
  2. The error terms do not have a constant variance.
  3. The model fits all but one or a few outlying observations.
  4. The error terms are not normally distributed.
  5. The error terms are not independent.
The common graphical tools for assumption checking includes:
  1. Residual Plot- scatter plot the residuals against X or the fitted value.
  2. Absolute Residual Plot- scatter plot the absolute values of the residuals against X or the fitted value.
  3. Normal Probability Plot of the Residuals.
  4. Time Series Plot of the Residuals - scatter plot the residuals against time or index.
  5. The time series plot of the residuals are strongly recommended whenever data are obtained in a time sequence. The purpose is to see if there is any correlation between the error terms over time (the error terms are not independent). When the error terms are independent, we expect the residuals to fluctuate in a more or less random pattern around the base line 0.

In the following example, since the observations are from independent individuals, we will just use the first three plots to do assumption checking.

EXAMPLE:\ A cardiology data set was collected by the University of Virginia School of Medicine. Two variables were examined, aortic valve area (AVA) and body surface area (BSA). Physiologically, as children grow, the intracardiac areas also grow. A linear model which relates AVA with BSA, a proxy for physiological growth, has been widely accepted in medical science. (see Gutgesell and Rembold, 1990 and its references). The top left, top right, bottom left, bottom right plots in Figure (AVA vs. BSA) are, respectively, the scatter plot of AVA vs. BSA with the fitted regression line, normal probability plot of the residuals, residual plots with the base line 0 and the absolute residual plot. We can see that (1) AVA and BSA seem to be linearly related (2) The error variance increases with the BSA. (2) is even more obvious in the absolute residual plot. Since we know now that the error terms do not have a common variance, the normal probability plot does not provide much information here. We just note that the errors are not normal when the points do not follow a straight line.

In the Figure (log[AVA] vs. BSA), we do the same plots but replace AVA by log[AVA]. We can see that log[AVA] is not linearly related with BSA in the scatter plot. We also notice the systematic pattern (curvature) in the residual plot, which indicates a departure from a linear model.

We now log-transform both AVA and BSA and obtain the plots in Figure (log[AVA] vs. log[BSA]). We see now the model fits all but one observation in the left bottom corner. The residuals actually fluctuate in a more or less random pattern around the base line 0. Also beside one point, the points in the normal probability plot roughly follow a straight line.


next up previous contents
Next: Computer Lab Up: Inferences for Simple Linear Previous: Inferences for Population Correlation

Jan Lethen
Wed Nov 13 16:20:46 CST 1996