StatConcepts: A Visual Tour of Statistical Ideas
H. Joseph Newton and Jane L. Harvill
Published April, 1997 by
Duxbury Press
Click here
to see Henrik Schmiediche's JAVA version of the random sampling
lab in StatConcepts.
From the Preface:
Most introductory statistics courses consist of three parts: 1)
Descriptive statistics; using numbers and graphs to summarize the
information about a data set, 2) Inferential statistics; making
conclusions about numerical characteristics of entire populations of
objects from those of samples from the populations, and 3)
Statistical concepts; the basic logical and mathematical ideas
underpinning descriptive and inferential statistics.
There are a wide variety of computer programs that make it easy for
students to accomplish what is required for the first two of these
parts, while there is very little software for illustrating
statistical concepts. That's why we wrote StatConcepts; a set of
``laboratories'' for illustrating ideas.
StatConcepts is actually a collection of programs written in the
language of StataQuest, which is a student version of a program called
Stata which is designed to do descriptive and inferential statistics.
StatConcepts is not intended as a text, but as a supplement to the
many introductory statistics texts that exist. Its main focus is on
correct intrepretation and understanding of statistical concepts,
terminology, and results and not on computation for a given problem,
although there are some labs that allow students to compute results.
In many ways, the computer is the laboratory for the science of
statistics. Most of the ideas of statistics start out with the phrase
``If we did this procedure over and over again, then this is what we
would see.'' The only way to realistically do things over and over
again is on a computer. In these labs we have tried to use graphics
to show what in fact we would see if we did various things over and
over again.
We assume that instructors will not incorporate all of the labs in the
StatConcepts collection (there are 28 of them!) into a course, but
rather pick and choose those they feel would be most useful in the
course (and that they have time to cover in their already cramped
schedule).
We would hope that instructors can show the labs to the students using
some kind of projection, but each chapter of this book contains a
``guided tour'' through each lab that a student could read while at a
computer. These guided tours cannot totally replace an instructor but
they can certainly help instructors use the labs as a supplement to
their course.
While the labs and this book is intended primarily for introductory
courses, we have found them very valuable in courses at all levels.
The level has been kept as nontechnical as possible, but more advanced
students will be able to relate to the graphs and descriptions at a
more mathematical level.
Overview of the Labs
There are 28 labs in the complete collection although there are fewer
items on the Labs menu as some items have submenus containing
more than one lab. There is a chapter in this book for each item on
the Labs menu, including:
-
Introduction to Concept Labs: This lab is actually just a
greeting and an invitation to look at a help file giving an overview
of the entire collection of labs. It also allows the user to specify
their own random number generator seed (see Chapter 1).
-
Random Sampling
Lab : This lab repeatedly shows random sampling without
replacement from a population of 100 boxes. It also previews the ideas
of sampling distributions and the central limit theorem. This Java
Applet was written by Henrik Schmiediche of the Department of
Statistics of Texas A&M University.
-
Relative Frequency and Probability: This lab again illustrates
random sampling without replacement using the example of a lottery game.
It also illustrates the relative frequency interpretation of probability
by repeatedly drawing six winning numbers from the numbers 1 through 50
and keeping track of the number of draws containing at least two
consecutive numbers. Deriving the formula for the probability of
this event is beyond the scope of most courses.
-
How are Populations Distributed?: This lab shows students that
distributions come in all shapes and sizes and come in parametric
families. It graphs densities from 14 different families and also
generates random samples from one member of each family and
superimposes the density on the histogram of the sample, thus
illustrating variability from one sample to another.
-
Sampling from 0-1 Populations: This item actually leads to
four different labs:
- Sampling With and Without Replacement: The binomial
and hypergeometric distributions are illustrated by having the user specify
the number of elements in a 0-1 population, the proportion
of 1's, and the size of a sample, and then superimposing the
probability plot of the number of 1's in the sample under the
sampling with and without replacement conditions.
- The Negative Binomial Distribution: This lab graphs the negative
binomial distribution for user-specified values of the parameters.
- Poisson Approximation to Binomial: This lab superimposes
the binomial distribution and its poisson approximation for user specified
values of the parameters. It makes it easy to see when the Poisson
approximation works well and when it doesn't.
- Normal Approximation to Binomial: This lab superimposes the
binomial distribution and its normal approximation for user specified
values of the parameters. It makes it easy to see when the normal
approximation works well and when it doesn't.
-
Bivariate Descriptive Statistics: This item leads to three different
labs:
- Scatterplots I: This lab shows scatterplots of random samples
from a bivariate normal population for 20 different values of the correlation
coefficient ranging from -0.9 to 0.9.
- Scatterplots II: This lab allows the user to generate
scatterplots for any sample size and any population correlation
coefficient.
- Least Squares: This lab allows the user to generate a wide
variety of different scatterplots and then see the true line, the least
squares line, and the vertical errors that go into the residual sum of
squares.
-
Central Limit Theorem: This lab illustrates sample means
for repeated sampling from a user specified choice of four parent
populations (normal, exponential, uniform, and 0-1) and is
actually two labs in one:
- One-at-a-time: One sample at a time, boxes corresponding
to sample means are
placed above an axis until the tallest column of boxes fills the graph.
- 500 Samples: The histogram of the sample means for 500
samples is drawn with the approximating normal curve superimposed.
-
Z, t, Chi-square, F: This item leads to six labs:
- Critical Values: This lab graphs rejection regions for one
and two tailed tests for any of Z, t, Chi-square, or F for
user specified significance level and, if necessary, degrees of freedom.
- Normal Curves: This lab starts by drawing the standard normal
curve and then the user can repeatedly change the mean and/or variance and
each time the
lab draws the new
normal curve on the same axes.
- Chi-square Curves: This lab starts by drawing the Chi-square
curve with 10 degrees of freedom and then the user can repeatedly
change the degrees of freedom and each time the lab draws the new Chi-square
curve on the same axes.
- F Curves: This lab starts by drawing the F
curve with 10 and 10 degrees of freedom and then the user can repeatedly
change the degrees of freedom and each time the lab draws the new F
curve on the same axes.
- t Converging to Z: This lab allows the user to superimpose
any part of the Z curve and the same
part of the t curve for increasing degrees of freedom.
- Normal Approximation to Binomial: This is the same lab
as the one under the Sampling From 0-1 Populations Lab.
-
Sampling Distributions: This lab allows the user to generate
500 samples (or pairs of samples) of user specified size from one of
three parent populations (normal, uniform, and exponential) and calculate
the Z, one or two sample t, Chi-square, or F statistics and then
superimpose the histogram of the 500 statistics and the
theoretical normal theory curve. It also displays the
percentiles of the 500 statistics and the theoretical curve to
see the agreement (disagreement) of the two if assumptions
are (are not) met.
-
Minimum Variance Estimation: This lab allows the user to generate
500 samples of user specified size from one of four parent
populations (N(0,1), U(-0.5,0.5), t with three degrees of freedom, and
Laplace), each symmetric about zero, and then draw the histograms
of the 500 sample means and 500 sample medians. The sample mean and
standard deviations of the 500 means and 500 medians are also displayed.
The lab shows that the sample mean is not always the best estimator.
-
Calculating Confidence Intervals: This lab allows the user to calculate
confidence intervals for the 11 different one and two sample inference
situations for means, variances, and proportion problems usually
covered in an introductory course.
-
Interpreting Confidence Intervals: This lab allows the user to generate
50, 100, or 150 samples (or pairs of samples) of user specified size
from one of
four parent populations (normal, uniform, exponential, and 0-1)
and draw horizontal lines for the confidence intervals
(for user specified significance level)
for user specified parameter
as well as a vertical line representing the true
value of the parameter.
This allows the user to see the effect of changing significance level and
sample size on the width of intervals as well as the effect of
violation of assumptions on the confidence interval coverage probability.
-
Calculating Tests of Hypotheses: This lab allows the user to calculate
test statistics and p-values for the same 11 situations as in the
Calculating Confidence Intervals Lab. It draws a graph with
the test statistic marked and the tail areas corresponding to the
p-value shaded in.
-
Tests of Significance: This lab draws a graph of a user
specified Z, t,
Chi-square, or F curve and shades in the area corresponding to the
p-value for either a user specified or lab generated value of
a test statistic.
-
Level of Significance of a Test: This lab allows the user to generate
500 samples (or pairs of samples) of user specified size from one of
three parent populations (normal, uniform, and exponential) and calculate
the Z, one or two sample t, Chi-square, or F statistics and then
superimpose the histogram of the 500 statistics and the
theoretical normal theory curve. It shades in the area under
the curve for the rejection region (for the user specified significance level and
one or two
tailed test) and displays the proportion
of times the null hypothesis is rejected, thus showing
the agreement (disagreement) with the value of significance level if assumptions
are (are not) met.
-
Power of a Test: This lab is the same as the Level of Significance
of a Test Lab except now the user can specify the degree to which the
null hyopthesis is actually false (including actually
being true). This allows the user to see the effect
of sample size and degree of falseness on the power of the test.
-
Between and Within Variation: This lab starts by generating
and graphing two sets of four samples, each of user specified size. The first
set of four samples are from normal populations having differing means,
while in the second set the population means are all the same. All populations
have the same variance. Then the
lab allows the user to repeatedly change the population variance, each
time redrawing the plot and displaying the p-value of the One Way ANOVA
test of equality of means. This allows the user to see how the test
is comparing the between sample variability to the within sample variability.
-
Calculating One-Way ANOVA: This lab allows the user to enter sample
sizes, means, and variances (or standard deviations) for specified number of
samples and then displays the ANOVA table.
-
Chi-square Goodness of Fit Test: This lab illustrates the Chi-square
goodness of fit by generating a user specified number of points in a square
and then placing a grid of (user specified number of) boxes on the points,
counting how many are in each box, and then displaying the p-value
of the resulting Chi-square test. This allows users to see how nonuniform
the placement of points can look even though they are in
fact being uniformly placed.