This class provides an introduction to Machine Learning and its core algorithms.
Teaching Assistants
Lu Meng (lumeng@stat.columbia.edu)Office hours: Tue 5:307:30pm, 1025 SSW (tenth floor, Department of Statistics)
Jingjing Zou (jingjing@stat.columbia.edu)
If you have questions on how your homework was graded, please address them to Jingjing.
Homework
 Number 1 (due: 13 Feb)
 Number 2 (due: 4 Mar)
Additional files: Digit data and fakedata.R  Number 3 (due: 3 Apr)
 Number 4 (due: 17 Apr)
Additional files: histograms.zip  Number 5 (due: 1 May)
Textbooks
The course is not based on a specific textbook. The relevant course materials are the slides.First half of the class
If you would like to complement lectures and slides by further reading, probably the best reference for the first half of the class (roughly up to the midterm) is:
The Elements of Statistical Learning
T. Hastie, R. Tibshirani and J. Friedman.
Second Edition, Springer, 2009.
[Available online here]
Topic  Chapter 

Linear classifiers, Perceptron  4.1, 4.5 
Maximum margin classifiers  SVMs 
Kernels  12.3 
Model selection and cross validation  7, in particular 7.10 
Trees  9.2 
Boosting  10.1, 10.8 
Bagging  8.7 
Random Forests  15 
Linear regression  3.2 
Shrinkage  3.4 
Second half of the class
There is unfortunately no single book that covers all topics in the second half of the class well, but some useful sources are:
Pattern Recognition and Machine Learning.
Christopher M. Bishop.
Springer, 2006.

Machine Learning: A Probabilistic Perspective.
Kevin P. Murphy.
MIT Press, 2012.

Bayesian Reasoning and Machine Learning.
David Barber.
Cambridge University Press, 2012.
[Available online]
Other references

Information Theory, Inference, and Learning Algorithms.
David J. C. MacKay.
Cambridge University Press, 2003.
[Available online]

Pattern Classification.
Richard O. Duda, Peter E. Hart, David G. Stork.
Wiley, 2001.

Convex Optimization.
Stephen Boyd and Lieven Vandenberghe.
Cambridge University Press, 2004.
[Available online]
Syllabus
There will be five or six homework assignments; you will usually have two weeks to complete each homework. The final grade will be computed as
40% homework + 30% midterm + 30% final exam
The midterm will cover the material of the first half of the class. The final will cover
only the material covered after the midterm; you will not have to repeat everything all over again.
Preliminary list of topics
Week  Content 

1  Introduction 
Review of basic concepts: Maximum likelihood, Gaussian distributions, etc.  
2  Classification basics: Loss functions, naive Bayes, linear classifiers 
3  Support vector machines, convex optimization 
4  Kernels; model selection and cross validation 
5  Ensemble methods: Boosting, bagging, random forests 
6  Regression: Linear regression, regularization, ridge regression 
7  Linear algebra review, highdimensional and sparse regression 
8  Dimension reduction, data visualization, principal component analysis 
9  Clustering, mixture models and EM algorithms 
10  Information theory; Text analysis 
11  Markov models, PageRank 
12  Hidden Markov models, speech recognition 
13  Bayesian models 
14  Sampling algorithms and MCMC 