This part of the project is designed to give you experience with:
The Mann-Whitney test is a nonparametric procedure testing whether observations in two samples come from the same distribution. In this assignment, you will write an R script named 'mann-whitney.R' that defines a 'mann.whitney' function to compute the Mann-Whitney test statistics and associated p-value using Monte Carlo methods. Specifically, your function declaration should be "mann.whitney <- function(sample1,sample2,alternative="greater")" where "sample1" and "sample2" are both numeric vectors have arbitrary (and perhaps different) lengths. The "alternative" hypothesis can be either "less" or "greater". You don't need to implement "two.sided" p-value calculation. The return value of your function should be a numeric vector of length 4 giving the test statistic, Monte Carlo p-value, and lower and upper bounds of a 95% confidence interval.
Your implementation can assume there are no ties in the data. Therefore, to generate data under the null hypothesis, simply sample both datasets from the same continuous distribution. Define the test statistics at U_1 as given by the Wikipedia entry. Do not use the normal approximation.
This part of the project will give you experience implementing Monte Carlo power calculations.
Using the "svn cp" command, copy "urn.R" for our recent lecture into your directory for this project. Add code at the bottom to answer the following: 1. If $\alpha$ (the probability of a Type I error) is 0.05, what is the rejection region for this problem? 2. If the true proportion of black balls is 0.6, what is the power of the test for the null hypothesis that proportion of black balls is 0.5, versus the one-side alternative that the proportion is greater than 0.5? 3. What if the true proportion is 0.7? In answering these questions, be sure to assess the Monte Carlo error.
Slightly adapted from problem by Jamis Parrett.
Purpose: The purpose of this assignment is two-fold: 1. To give you practice searching for assistance on the internet and in the help system to learn how to do something you have not been thoroughly taught, and 2. To give you practice creating a complex graphic in R.
Problem: Consider the New York Times (Jan. 11, 1981, p. 32) graphic showing New York weather patterns for 1980:
Your assignment is to create a graph similar to that one using weather data for College Station during 2009. Create your graphic in a script called 'nytimes.R' and generate a photo-ready plot in a PDF file named 'nytimes.pdf'.
You need not copy the New York Times graph to minute detail, but you should come up with a graph of the same nature. You also need not use the exact same weather
variables as those used in the New York graph --- there are several to choose from in the College Station data and you need not use all of them.
You can obtain data from wunderground.com. Note that, at the bottom of that page, you can download the
data as a comma delimited file.
Note that the New York graphic:
It's a good idea to use the web to learn about R graphics. One good starting place is http://cran.r-project.org/doc/contrib/Lemon-kickstart/index.html. Also, Google can be very helpful, for example, a search on the keywords "R par command rotate" turns up useful information on how to rotate text on tick mark labels and axes. Functions that you may want to look into:
Commit your 'mann-whitney.R', 'urn.R', and 'nytimes.R' script (and any data files that your script needs to read!) to the appropriate directory in the repository.