STAT 604: Intro. to Statistical Computing
Prof. David B. Dahl
Languages and Environments for Statistics
There are many programming languages and analysis environments that may be of interest to statisticians.
Some languages and environments (with my personal impressions) are below.
Data analysis software:
- R
- An implementation of the S language
- Preferred by academic statisticians
- Provides both a command line interface and a batch-oriented mode
- Easy to extend with add-on packages, hence there is a large library of add-on packages
- Lacks a fancy graphical interface of S-Plus
- Distributed under the GPL and there are no licensing fees
- S-Plus
- An implementation of the S language
- Commercial product targeted at the business market
- Provides a menu-driven graphical interface
- Porting back and forth between R and S-Plus is possible, but it can be a hassle
- Stata
- Especially popular with economists and biostatisticians
- Commercial product
- Easy to extend with add-on packages, hence there is a large library of add-ons
- Programming language features not as rich as those in R
- Developed in College Station, TX
- SAS
- Has its roots among academic statisticians
- Now focuses on the business market
- Commercial product
- Very popular is some industries, e.g. pharmaceutical industry
- Difficult of extend with add-on packages
- Programming language features can be very constraining
- SPSS
- Has its roots in the social sciences
- Now tries to have a broader appeal
- Commercial product
- Popular among non-statisticians needing to analyze data
- Provides a menu-driven graphical interface
- Most popular statistics package on campus for nonstatisticians.
- Has arguably more functionality than Minitab
- Minitab
- Has its roots among academic statisticians
- General purpose data analysis package for non-statisticians
- Provides a menu-driven graphical interface
- Commercial product
- NCSS
- Developed by Ph.D. statisticians who graduated from Texas A&M
- Similar to Minitab
- Provides a menu-driven graphical interface
- Commercial product
Low-Level Languages:
- Fortran
- Speed king
- A lot of highly-optimized, well-tested libraries are available
- C
- Arguable the most popular low level language
- More feature rich and yet (almost) as fast as Fortran
- Low-level language typically used when extending functionality in higher-level languages
- C++
- Cousin of C. "It is a statically typed, free-form, multi-paradigm, usually compiled language supporting procedural programming, data abstraction, object-oriented programming, and generic programming."
- Increase functionality and complexity, yet not significantly slower than C
- Java
- Object oriented language whose code runs on a virtual machine
- Runs on Windows, Linux, and UNIX with having to recompile code
- Relatively few desktop (i.e., graphical) programs are developed with it
- Developed largely by Sun Microsystems, but is now Open Source
- C#
- Microsoft's answer to Sun Microsystem's Java
- Runs on Windows (and to a lesser extend Linux and UNIX using Mono)
Numerical Computations:
- Matlab
- High-level language, primarily intended for numerical computations
- Solves linear and nonlinear problems numerically
- Provides both a command line interface and a batch-oriented mode
- Extra toolkits can be purchased for added functionality
- Easy to extend with add-on packages
- Very popular among engineers
- Used by some academic statisticians
- Octave
- Mostly compatible with Matlab
- Distributed under the GPL and there are no licensing fees
Symbolic Calculations:
- Maple
- General purpose computer algebra system
- Mathematica
- General purpose computer algebra system