Statistical Machine Learning (W4400) • Spring 2014

This class provides an introduction to Machine Learning and its core algorithms.
Course Slides

Here is the complete set of course slides (version: 1 May) as a single file.

Teaching Assistants

Lu Meng (
Office hours: Tue 5:30-7:30pm, 1025 SSW (tenth floor, Department of Statistics)
Jingjing Zou (
If you have questions on how your homework was graded, please address them to Jingjing.


The course is not based on a specific textbook. The relevant course materials are the slides.

First half of the class

If you would like to complement lectures and slides by further reading, probably the best reference for the first half of the class (roughly up to the midterm) is:
  • The Elements of Statistical Learning
    T. Hastie, R. Tibshirani and J. Friedman.
    Second Edition, Springer, 2009.

    [Available online here]
Here are some pointers to specific chapters:
Topic Chapter
Linear classifiers, Perceptron 4.1, 4.5
Maximum margin classifiers SVMs
Kernels 12.3
Model selection and cross validation 7, in particular 7.10
Trees 9.2
Boosting 10.1, 10.8
Bagging 8.7
Random Forests 15
Linear regression 3.2
Shrinkage 3.4

Second half of the class

There is unfortunately no single book that covers all topics in the second half of the class well, but some useful sources are:
  • Pattern Recognition and Machine Learning.
    Christopher M. Bishop.
    Springer, 2006.
  • Machine Learning: A Probabilistic Perspective.
    Kevin P. Murphy.
    MIT Press, 2012.
  • Bayesian Reasoning and Machine Learning.
    David Barber.
    Cambridge University Press, 2012.

    [Available online]

Other references

  • Information Theory, Inference, and Learning Algorithms.
    David J. C. MacKay.
    Cambridge University Press, 2003.

    [Available online]
  • Pattern Classification.
    Richard O. Duda, Peter E. Hart, David G. Stork.
    Wiley, 2001.
  • Convex Optimization.
    Stephen Boyd and Lieven Vandenberghe.
    Cambridge University Press, 2004.

    [Available online]


There will be five or six homework assignments; you will usually have two weeks to complete each homework. The final grade will be computed as
40% homework + 30% midterm + 30% final exam
The midterm will cover the material of the first half of the class. The final will cover only the material covered after the midterm; you will not have to repeat everything all over again.

Preliminary list of topics

Week Content
1 Introduction
  Review of basic concepts: Maximum likelihood, Gaussian distributions, etc.
2 Classification basics: Loss functions, naive Bayes, linear classifiers
3 Support vector machines, convex optimization
4 Kernels; model selection and cross validation
5 Ensemble methods: Boosting, bagging, random forests
6 Regression: Linear regression, regularization, ridge regression
7 Linear algebra review, high-dimensional and sparse regression
8 Dimension reduction, data visualization, principal component analysis
9 Clustering, mixture models and EM algorithms
10 Information theory; Text analysis
11 Markov models, PageRank
12 Hidden Markov models, speech recognition
13 Bayesian models
14 Sampling algorithms and MCMC