Project 9
STAT 605: Advanced Statistical Computations
David B. Dahl
Spring 2010
Self-Directed Project
This is a capstone project that seeks to build on the skills gained throughout the semester by having you work on a topic of your choice.
There are three variants of the project. Choose one.
You are strongly encouraged to work in teams of two or three, in which case only one project is submitted per team.
- Software: This version of the project consists of designing, coding, testing, and documenting statistical software. You can work on a new project, add to existing statistical software, or provide functionality to non-statistical software that would be useful to statisticians. You can choose whichever programming language is most appropriate for the project including low-level languages (Fortran, C, C++, Java, etc.) as well as high level languages (R, Ruby, Matlab, Python, SAS, vim, bash, etc.). My recommendation would be to use a high-level language if you are not already familiar with low level languages. The scope of the project is wide open. As you think about what project you might choose, consider some examples of the types of projects that might be appropriate:
- An R package that:
- Performs a statistical analysis (preferably one that is not currently implemented). See the
http://cran.r-project.org/web/packages/CRAN website for a list of packages done in the past.
- Produces HTML and LaTeX tables from objects in R. See the http://cran.r-project.org/src/contrib/Descriptions/xtable.htmlxtable package.
- A Ruby package that:
- Implements some statistical methodology such as linear models and/or generalized linear models, etc.
- Provides functionality useful for statisticians.
- Provides from interactive data analysis similar to that of R.
- Software that integrates two environments. See, for example:
- Software that integrates R and Python. See http://rpy.sourceforge.net/RPy, for example.
- Software that integrates R and Ruby. See http://rinruby.ddahl.org/RinRuby, for example.
- Software that facilitates the use of R in some way. See http://sourceforge.net/projects/npptor/, for example.
- A library written in C to performs some statistical calculation:
- http://www.stat.tamu.edu/ aredd/tamuanova/TAMU ANOVA: ANOVA Extension to the GNU Scientific Library
There is a tendency to embark on a project that is too ambitious. I would rather see a straight forward project done well than a grandiose project that is not fully completed. Also, project that are unique are preferred to those that merely reimplement existing software.
- Replicated Simulation Study: This version of the project, inspired by alumnus James Gentle, consists of:
- Identifying an article in the scientific or statistical literature that reports a simulation (a.k.a., Monte Carlo) study to evaluate a statistical method or compare several methods,
- Replicating at least a portion of the Monte Carlo study, and
- Extending the Monte Carlo study in some way by, for example, include another methods in the comparison or adding additional factors.
Note that a paper which uses Monte Carlo or MCMC as an estimation technique is not appropriate, unless it also satisfies the first criterion above. (It might, however, make an appropriate software project.) Be very careful to not choose an article whose statistical methods are so complex that implementing the method becomes a research project in-and-of-itself. This is a common mistake when choosing the Monte Carlo option.
- Original Simulation Study: This version of the project involves designing, programming, executing, and reporting an original simulation study.
Initial Project Report
Please submit in the Project 9 directory a one-page PDF file called "initial-report.pdf" which gives: The name of your project, the names of your team members, and a description of your project. Also commit the most revalent paper in the case of implementing
a particular statistical method or replicating an existing simulation study. This initial report is your opportunity to convince Dr. Dahl that you have a good project and have thought carefully about how you will go about implementing it.
Schedule for Oral Presentations
Schedule for Oral Presentations
Submission
Your project will be evaluated on the oral presentation, written report/documentation, quality of the code, and difficulty. Please submit all work that you would like evaluated (including the PDF of the slides from your oral presentation and the written report) to the Project 9 directory of the Subversion repository.