STAT 689, STATISTICAL DATA MINING, SPRING 2005

Instructor

  • Dr Marc G. Genton
  • E-mail: genton@stat.tamu.edu
  • Phone: 845-3152
  • Office: 405D Blocker
  • Office hours: 3:45-5:00PM Tuesday and Thursday
  • Textbook

  • Hastie, T., Tibshirani, R., Friedman, J. (2001), The Elements of Statistic al Learning: Data Mining, Inference, and Prediction. Springer.

    Web page for the book: www-stat-class.Stanford.EDU/~tibs/ElemStatLearn/

    Course Schedule

    Lecture: T R: 9:35-10:50AM in Blocker 411
    Preq: STAT 610, 611

    Course intro ( .ps or .pdf )


    Data Sets and Links

    Splus tutorial ( .ps or .pdf )

    Topics

    Jan. 18: Introduction; Data sets (1)
    Jan. 20: Supervised learning (2.1, 2.2); Least squares (2.3); Nearest neighbors (2.3)
    Jan. 25: NO CLASS, but: reading (2.1, 2.2, 2.3) AND (2.4, 2.5)
    Jan. 27: Statistical decision theory (2.4); Curse of dimensionality (2.5)
    Feb. 1: Function approximation (2.6); Structured regression and restricted estimators (2.7); Bias-variance tradeoff (2.8)
    Feb. 3: Linear regression and least squares (3.1, 3.2); Subset selection (3.4); Coefficient shrinkage: ridge regression (3.4)
    Feb. 8: Coefficient shrinkage: ridge regression (3.4)
    Feb. 10: Coefficient shrinkage: the lasso (3.4) *** 2.4 due Feb. 10 ***
    Feb. 15: Linear methods for classification (4.1, 4.2)
    Feb. 17: Linear Discriminant Analysis (4.3); Quadratic Discriminant Analysis (4.3)
    Feb. 22: Fisher's linear discriminant function (4.3); Logistic regression (4.4); Perceptron (4.5)
    Feb. 24: Optimal separating hyperplanes (4.5); *** 4.2 due Feb. 24 ***
    Mar. 1: Support vector machines (12.3)
    Mar. 3: Support vector machines (12.3)
    Mar. 8: Support vector machines (12.3); ***1 page project proposal due Mar. 8 ***
    Mar. 10: Generalized Additive Models (9.1); Trees: CART (9.2)
    SPRING BREAK
    Mar. 22: NO CLASS (ENAR meeting in Austin)
    Mar. 24: Trees: CART (9.2); *** 12.1 due Mar. 24 ***
    Mar. 29: Trees: CART (9.2); Bagging (8.7)
    Mar. 31: PRIM (9.3); MARS (9.4)
    Apr. 5: Boosting (10.1-10.10)
    Apr. 7: Boosting (10.1-10.10)
    Apr. 12: Neural networks (11)
    Apr. 14: Association rules, Market basket analysis (14.1-14.2); *** 10.1 due Apr. 14 ***
    Apr. 19: Cluster analysis (14.3); *** Project report due ***
    Apr. 21: Project presentation: B. Li; L. Liu; Y. Liu; S. Lee
    Apr. 26: Project presentation: J. Wagaman; P. Dwyer; C. Shih; R. Hardin
    Apr. 28: Project presentation: D. Glab; Y. Ren; M. Chen; L. Qin
    May. 3: Project presentation: J. Dougherty; J. Cho; Y. Marchenko; W. Zhang


    Homeworks

    HW1: 2.2, 2.3, 2.4, 2.7, 3.5 *** 2.4 due Feb. 10 ***
    HW2: 3.7, 3.14, 4.2, 4.6, 3.9 *** 4.2 due Feb. 24 ***
    HW3: 4.4, 4.7, 12.1, 12.2, 12.3 *** 12.1 due Mar. 24 ***
    HW4: 9.2, 9.6, 9.7, 10.1, 10.2 *** 10.1 due Apr. 14 ***


    Links and Data Sets

  • www.support-vector.net
  • Journal of Machine Learning Research
  • Data Mining and Statistics: what's the connection by Jerome Friedman, Stanford
  • STATOO: Data Mining Links
  • STATOO: What is Data Mining?
  • STATOO: Newsletters
  • www.kernel-machines.org
  • Elastic net (H. Zou, Stanford)



    This page has been accessed times since January 7, 2005.