Course Calendar & Materials
STAT 605: Advanced Statistical Computations
David B. Dahl
Spring 2010
Wednesday, January 20, 2010
Computation numerical methods are used by statisticians. For example:
- Simulation (a.k.a., Monte Carlo) studies
- The evaluation of distributional functions for random variables
- Random number generation, including Markov chain Monte Carlo (MCMC)
- Numerical optimization & root finding methods, including the Newton-Raphson method and EM algorithm
- Resampling techniques, including the permutation test and the bootstrap
- Numerical treatment of linear algebra
- Algorithmic complexity
Computing world of a statistician
TIOBE Programming Community Index
Computing resources:
- Graduate computer lab (room 421) has Linux workstations: g1.stat.tamu.edu to g8.stat.tamu.edu
- General purpose Linux login servers: s0.stat.tamu.edu to s1.stat.tamu.edu
- General purpose Linux computational servers: s6.stat.tamu.edu to s9.stat.tamu.edu
- Note: Only s0 and s1 are accessible outside of the university, unless you use VPN.
- Text-based login from Windows using PuTTY
- Graphical login from Windows using NX client
- During class time, you can use NX to log into s0-s1 and s6-s9.
- Outside of class time, you should only use NX on s0, s1, and s9.
- File transfer using WinSCP
- Henrik's somewhat-dated orientation to computing infrastructure
Working at the command line:
VIM: Non-graphical text editor:
Monday, January 25, 2010
Subversion: Revision control system
Comprehensive documentation: Version Control with Subversion
Subversion clients:
- svn: Command-line client, recommended over graphical interfaces
- TortoiseSVN: Graphical interface for Subversion using Windows Explorer
- Available in KDE and GNOME desktop environments
- Available in many integrated development environments
- Subversion is used at STATA
In your home directory, check out the STAT 605 repository using: "svn co svn://dahl.stat.tamu.edu/stat605/2010a-spring stat605"
kompare4svn: Shell script to graphically show differences in revisions
Wednesday, January 27, 2010
Project Due
Computing topics:
- More on Subversion
- Configuration of VIM
- Line feeder in VIM
Simulation Studies:
- Represent a real-world process of interest on the computer to evaluate statistical properties associated with the process.
- Uses include:
- Evaluating point estimators
- Evaluation confidence intervals
- Checking the finite-sample statistical properties of estimators and testing procedures that have been motivated through asymptotics
- Testing hypothesis
- Describing distributions
- and many others...
- Especially useful when theoretical derivations are unavailable, difficult, or intractable.
- "Simulation study" and "Monte Carlo study" are synonymous.
Monte Carlo Studies in Statistics by James E. Gentle
Read Givens and Hoeting, pgs. 143-144
Notes on Monte Carlo integration
Monday, February 1, 2010
R Project for Statistical Computing
Wikipedia entry on R
Read "An Introduction to R"
New York Times articles regarding R: initial article and follow-up blog
R for MATLAB users
Rb: Script to help running R scripts
R example using Old Faithful data
Board game simulation
Probability distributions in R
Wednesday, February 3, 2010
Project Due
Welch's 1938 paper regarding two-sample tests for equality of means
Reminder of definition of power
Simulation studies comparing Welch's method to others in welch1938.R
Simple linear regression illustrated in faithful.R
Monday, February 8, 2010
Project 2 reprise... Scripts are in "students/dahl605" directory of the repository
OpenSSH public key authentication
Notes from Roger D. Peng on R functions
Reminder of definition of p-value
Polya urn simulation to compute p-values and power in urn.R
Wednesday, February 10, 2010
Project Due
Monte Carlo p-value for tack problem in 2005 examination
Permutation tests
Givens and Hoeting, Section 9.7
Fisher's exact test in dieting.R
Permutation test for correlation in gpa.R
Permutation test for earnings of brothers in 2005 examination
Monday, February 15, 2010
Example of executable R scripts.
random-exponentials: An example of an executable R script
use-executable-script.R: Using an executable R script in R code
Permutation test for positional dependences among protein backbone torsion angles distributions in densities directory
Running jobs in the background using "&" and "screen"
Monitor jobs using "top"
Wednesday, February 17, 2010
A collection of shell commands in an executable file is called a shell script
We are using the bash shell
Quick guide to bash shell scripts
Advanced Bash-Scripting Guide: When you are ready to dive in!
Dirsize: Shell script to recursively show directory sizes
Hunter: Shell script to check computer resources in the department
Shell script for least-squares clustering method
R uses a shell script to get itself started
Monday, February 22, 2010
Project Due
LaTeX
The Not So Short Introduction to LaTeX 2e
LaTeX Tutorials: A Primer
Kile
Spell check a LaTeX file "foo.tex" using "aspell -t -c foo.tex"
Including figures in LaTeX documents processed by "pdflatex"
Beamer: PowerPoint-like presentations using pdfLaTeX
Inverse CDF method
Givens and Hoeting, pg. 145
Box-Muller Transformation
Sampling from familiar distributions
Givens and Hoeting, pg. 146
Rejection sampling
Givens and Hoeting, pgs. 147-150
Generic code for rejection sampling in rejection-sampler.R
Sampling from a beta distribution using rejection sampling in sample-via-rejection.R
Wednesday, February 24, 2010
Importance sampling, take I
Importance sampling, take II
Givens and Hoeting, pgs. 162-169
Demonstration of importance sampling in R
svnadmin: To manage (i.e. create, backup, etc.) Subversion repositories
Version Control with Subversion: Comprehensive documentation for Subversion
Monday, March 1, 2010
Project Due
xtable: R contributed package to export tables to LaTeX or HTML
Writing R Extensions
CRAN Package Check Results
Running on the Brazos cluster
rsync: Utility that provides fast incremental file transfer
rsync example for update your webpage
unison: Bidirectional file synchronizer
Wednesday, March 3, 2010
Scripting Languages, Ruby, and R: Why?
Ruby homepage
Ruby documentation
Try Ruby! using Firefox (not Konqueror)
Ruby in Twenty Minutes
Ruby User's Guide
Ruby Core Reference
RCon example
Monday, March 8, 2010
Hotdog example
Environments for regexp
Regular expressions
Quick start on regexp
KDE's regular expression editor: kregexpeditor
Centroids example
Wednesday, March 10, 2010
RinRuby: Accessing the R Interpreter from Pure Ruby
RinRuby Paper
Milano example
Monday, March 22, 2010
Introduction to Markov chain Monte Carlo (MCMC) in simple-pmf.R
Markov chain Monte Carlo (MCMC), take I
MCMC for the beta distribution in R
Givens and Hoeting, Chapter 7 introduction, Section 7.1, and Section 7.3
Markov chain Monte Carlo (MCMC), take II
Article: Monte Carlo Sampling Methods Using Markov Chains and Their Applications
Article: Understanding the Metropolis-Hastings Algorithm
Wednesday, March 24, 2010
Project Due
Bayesian logistic regression via mcmc-sampler.R
Gibbs sampler for MCMC
Givens and Hoeting, Section 7.2
Gibbs sampler for mean and precision in mean-precision.R
Practice Midterm
Monday, March 29, 2010
Computer exam: 1:40-4:40pm in 457 Blocker
Wednesday, March 31, 2010
Solutions to midterm
Monday, April 5, 2010
"There are 10 kinds of people in the world - those who understand binary and those who don't."
Binary integer arithmetic
Bits and bytes
Useful KDE programs: kcalc, khexedit/okteta
Notes for Dean Joe Newton
Wikipedia information on IEEE floating point standard
Tutorial on IEEE floating point standard
Endianness
Underflow and overflow
Wednesday, April 7, 2010
Project Due
Maximum likelihood estimation
Newton-Raphson algorithm in newton-raphson.R
Givens & Hoeting, Chapter 2
MLE for binomial data via Newton-Raphson in binomial-newton.R
MLE for tied-up normal model via Newton-Raphson in tied-up-normal-mle.R
Monday, April 12, 2010
Comments on low-level languages
Introduction to Java
Wednesday, April 14, 2010
Numerical integration ---- Notes are in 4.14 directory
Numerical integration and maximization in R
Monte Carlo integration for generic functions
Monday, April 19, 2010
Interfacing R with low-level programming languages
Wednesday, April 21, 2010
More Interfacing R with low-level programming languages
Potential topic for final exam:
- Simulation studies
- Monte Carlo error
- Permutation tests
- Monte Carlo integration
- Inverse CDF method
- Rejection sampling
- Importance sampling
- Metropolis Hastings algorithm
- Gibbs sampling
- Numerical stability
- Binary arithmetic
- Machine representation of floating point numbers
- Regular expressions
- Numerical integration
- Newton-Raphson algorithm
- Maximum likelihood and Bayesian estimation
- How to choose a language/environment for a task
- Concepts regarding interfacing R with low-level languages
- Concepts regarding version control
- Implementing algorithms in pseudo-code
Previous exams:
Accessing R from scripting languages:
Accessing low-level languages (e.g., C, C++, Fortran, Java) from R:
Some other interesting items:
Monday, April 26, 2010
Written exam
Wednesday, April 28, 2010
Reprise of written exam with solutions
Student presentations for final project
Presentation schedule
Monday, May 3, 2010
Project Due
Student presentations for final project
Presentation schedule
Tuesday, May 11, 2010
Project Due
Student presentations for final project during exam time (3:30am - 5:30pm in regular classroom)
Presentation schedule