Project 2

STAT 605: Advanced Statistical Computations
David B. Dahl
Spring 2010

Iris Data

The iris data of Fisher or Anderson is a famous dataset. It gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

Using the iris dataset, write an R script named "iris.R" which does the following:

Your script should use a relative path to access the iris dataset that is already in your working copy of the repository. Specifically, your R script will be in "students/$USER/project/2" so it will use "../../../../html/project/2/iris.txt" to access the iris data.

To help you complete this assignment, study the R examples from lecture and read the "Introduction to R". If you are stuck, talk with your colleagues. If you cannot get yourself unstuck after a lot of searching, reading, and experimenting, hints are available.


Let's Make A Deal

Description

This section copied from a website by Kevin O'Bryant at UCSD.

I can only assume Monty Hall's game show Let's Make A Deal took place sometime during the seventies. Information on this particular game show has somehow eluded the Internet and my less than vivid memory sometimes fails me, but the basic setup for the game is as follows. Pretty much the entire audience dresses up like a complete loon (Raggedy Ann and Andy were fairly popular costumes) hoping that Monty Hall would select them out of the crowd and offer them a chance to win a fabulous prize. For instance, he might offer you $100 for every paper clip that you have in your possession or he might give you $500, but then ask you if you would like to keep the money or trade it for what's in a particular box. Of course there could be $1000 in the box or a single can of dog food. Anyway, I'm digressing and hopefully you get the basic gist of the game.

The particular game that we are concerned with here is where Monty Hall offers you the opportunity to win what is behind one of three doors. Typically there was a really nice prize (i.e. a car) behind one of the doors and a not-so-nice prize (i.e. a goat) behind the other two. After selecting a door, Monty would then proceed to open one of the doors you didn't select. It is important to note here that Monty would NOT open the door that concealed the car. At this point, he would then ask you if you wanted to switch to the other door before revealing what you had won.

Controversy

This section copied from a website by Kevin O'Bryant at UCSD.

In September of 1991 a reader of Marilyn Vos Savant's Sunday Parade column wrote in and asked the following question:

"Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the other doors, opens another door, say No. 3, which has a goat. He then says to you, 'Do you want to pick door No. 2?' Is it to your advantage to take the switch?"

This problem was given the name The Monty Hall Paradox in honor of the long time host of the television game show "Let's Make a Deal." Articles about the controversy appeared in the New York Times and other papers around the country. Marilyn's answer was that the contestant should switch doors and she received nearly 10,000 responses from readers, most of them disagreeing with her. Several were from mathematicians and scientists whose responses ranged from hostility to disappointment at the nation's lack of mathematical skills.


Your Tasks

Conduct a simulation study to determine whether Marilyn Vos Savant is correct. In R, write a program that simulates the game show as much as possible. For example, randomize the placement of the prize and the original door choosen by the contestant. Implement both strategies (i.e., "do not switch" versus "switch"). Give the probability of winning under each of the strategies. Quantify your uncertainity about the estimated probabilities by forming a confidence intervals on the proportions of wins. (Hint, the normal approximation to the binomial provides excellent results when the number of trials is large, as it is here.) Ensure that your script runs in less than about 30 seconds. Name your script "deal.R". For full credit, make sure your code clearly labels and displays the results rather than just printing out numbers.

You may wish to read this theoretical argument in favor of Marilyn Vos Savant.


Collect All Four

Some brands of cold cereal run promotions in which one of four free toys is included in the cereal box. The company encourages consumers to "collect all four." The typical approach is to buy one box at a time and stop as soon as the set is complete. Consider two scenerios: 1. That each toy is equally likely, and 2. That toys have selection probabilities 0.10, 0.25, 0.25, and 0.40. Write an R script named "all-four.R" which conducts a Monte Carlo simulation study to answer the following questions under the two scenerios:

  1. What is the mean number of boxes that a consumer must purchase to get a complete set?
  2. What proportion of consumers will need to purchase 14 boxes or more to complete a set?
In answering the questions, be sure to assess the Monte Carlo error (through, for example, a confidence interval). Ensure that your script runs in less than about 30 seconds. Use good coding standards, such as proper indentation and good variable names. Make your script reuse as much code as possible, as opposed to having virtually duplicate code with only minor modifications. Make sure your code clearly labels and displays the results rather than just printing out numbers.



Submission

Commit "iris.R", "deal.R", "all-four.R" to the appropriate place in the repository (i.e., in "students/$USER/project/2"). Do not submit the files generated by your scripts.