STAT 689, STATISTICAL DATA MINING, SPRING 2005
Instructor
Dr Marc G. Genton
E-mail: genton@stat.tamu.edu
Phone: 845-3152
Office: 405D Blocker
Office hours: 3:45-5:00PM Tuesday and Thursday
Textbook
Hastie, T., Tibshirani, R., Friedman, J. (2001), The Elements of Statistic
al Learning: Data Mining, Inference, and Prediction. Springer.
Web page for the book: www-stat-class.Stanford.EDU/~tibs/ElemStatLearn/
Course Schedule
Lecture: T R: 9:35-10:50AM in Blocker 411
Preq: STAT 610, 611
Course intro ( .ps or
.pdf )
Data Sets and Links
Splus tutorial ( .ps or
.pdf )
Topics
Jan. 18: Introduction; Data sets (1)
Jan. 20: Supervised learning (2.1, 2.2); Least squares (2.3); Nearest neighbors (2.3)
Jan. 25: NO CLASS, but: reading (2.1, 2.2, 2.3) AND (2.4, 2.5)
Jan. 27: Statistical decision theory (2.4); Curse of dimensionality (2.5)
Feb. 1: Function approximation (2.6); Structured regression and restricted estimators (2.7); Bias-variance tradeoff (2.8)
Feb. 3: Linear regression and least squares (3.1, 3.2); Subset selection (3.4); Coefficient shrinkage: ridge regression (3.4)
Feb. 8: Coefficient shrinkage: ridge regression (3.4)
Feb. 10: Coefficient shrinkage: the lasso (3.4) *** 2.4 due Feb. 10 ***
Feb. 15: Linear methods for classification (4.1, 4.2)
Feb. 17: Linear Discriminant Analysis (4.3); Quadratic Discriminant Analysis (4.3)
Feb. 22: Fisher's linear discriminant function (4.3); Logistic regression (4.4); Perceptron (4.5)
Feb. 24: Optimal separating hyperplanes (4.5); *** 4.2 due Feb. 24 ***
Mar. 1: Support vector machines (12.3)
Mar. 3: Support vector machines (12.3)
Mar. 8: Support vector machines (12.3); ***1 page project proposal due Mar. 8 ***
Mar. 10: Generalized Additive Models (9.1); Trees: CART (9.2)
SPRING BREAK
Mar. 22: NO CLASS (ENAR meeting in Austin)
Mar. 24: Trees: CART (9.2); *** 12.1 due Mar. 24 ***
Mar. 29: Trees: CART (9.2); Bagging (8.7)
Mar. 31: PRIM (9.3); MARS (9.4)
Apr. 5: Boosting (10.1-10.10)
Apr. 7: Boosting (10.1-10.10)
Apr. 12: Neural networks (11)
Apr. 14: Association rules, Market basket analysis (14.1-14.2); *** 10.1 due Apr. 14 ***
Apr. 19: Cluster analysis (14.3); *** Project report due ***
Apr. 21: Project presentation: B. Li; L. Liu; Y. Liu; S. Lee
Apr. 26: Project presentation: J. Wagaman; P. Dwyer; C. Shih; R. Hardin
Apr. 28: Project presentation: D. Glab; Y. Ren; M. Chen; L. Qin
May. 3: Project presentation: J. Dougherty; J. Cho; Y. Marchenko; W. Zhang
Homeworks
HW1: 2.2, 2.3, 2.4, 2.7, 3.5 *** 2.4 due Feb. 10 ***
HW2: 3.7, 3.14, 4.2, 4.6, 3.9 *** 4.2 due Feb. 24 ***
HW3: 4.4, 4.7, 12.1, 12.2, 12.3 *** 12.1 due Mar. 24 ***
HW4: 9.2, 9.6, 9.7, 10.1, 10.2 *** 10.1 due Apr. 14 ***
Links and Data Sets
www.support-vector.net
Journal of Machine Learning Research
Data Mining and Statistics: what's the connection by Jerome Friedman, Stanford
STATOO: Data Mining Links
STATOO: What is Data Mining?
STATOO: Newsletters
www.kernel-machines.org
Elastic net (H. Zou, Stanford)
This page has been accessed
times since January 7, 2005.