You'll find here some additional material to help you understand the course. This page is modified from Balaji Lakshminarayanan's older version.
Topic | Additional reading |
Matrix Foundations |
You may find different derivative conventions in various references. In this course the denominator convention is used.
- *Matrix Cookbook by K. Petersen and M. Pedersen
- Tom Minka's notes on matrix algebra in convention different from this course, but has more derivation examples
- Matrix Identities by Sam Roweis
- * Ged Ridgway's derivations of Matrix Inversion Lemma from first principles.
|
Statistics Foundations |
- *Cribsheet by Iain Murray
- Bishop: Chapter 1 and 2. In particular Chapter 2.3 for important Gaussian distribution properties
- David MacKay's book: Chapters 1-3 and 22-23. Appendices B, C
-
Nuances of Probability Theory by Tom Minka.
- Probability Theory: The Logic of Science by ET
Jaynes
- Probability and Statistics Online Reference
|
Optimisation Foundations |
- *Bishop: Appendix E for simple introduction to Lagrange multiplier
- More indepth intro to Lagrange multiplier in Athur Gretton's kernel lectures (Section 2 of the linked notes)
|
Latent Variable Models |
- Bishop: Chapters 3 and 12.
- David MacKay's book: Chapters 20-22, 34.
- Max Welling's Class Notes on PCA and FA
- Andrew Ng's Class Notes on Generalized Linear Model (Part III p.22 onwards)
|
EM |
- Bishop: Chapter 9
|
Latent state-space models |
- Bishop: Chapter 13
- Byron Yu's derivation of Kalman filtering and smoothing equations from first principles
- Minka, T. (1999) From Hidden Markov Models to Linear Dynamical Systems
- Welling notes on HMM and Kalman Filter
- Rabiner's tutorial on HMM
- Maneesh Sahani's paper on subspace identification SSID and an application to neural data(Section 2 for methods)
|
Graphical models |
- Bishop: Chapter 8. In particular:
- 8.2 directed graphs, 8.2.2 D-seperation
- 8.3 undirected graphs
- 8.4.3 factor graphs, 8.4.4 the sum-product algorithm (on a factor graph)
- MacKay: Chapter 16, 26
- David Heckermann's Tutorial on Graphical Models
- Belief Propagation
- Junction tree specific
|
Bayesian model selection |
- Bishop: Chapter 4.4, 3.4-3.5
- MacKay: Chapter 27-28
|
Gaussian Processes |
- Bishop: Chapter 6
- MacKay: Chapter 45
- *Carl Rasmussen's tutorial slides on Gaussian Processes
- *Richard Turner's lecture notes. Good material for understanding sparse GP and pointers for methods other than pseudo-data
- Quinonero-Candela and Rasmussen: A Unifying View of Sparse Approximate Gaussian Process Regression
- A great textbook about Gaussian Processes, available online: Gaussian processes for Machine Learning by Rasmussen and Williams
|
Expectation Propagation |
- Divergence measures and message passing (Minka 2005)
- Expectation propagation for exponential families (Seeger 2008)
- Graphical models, exponential families and variational inference
(Wainwright, Jordan 2008)
- Notes on EP by Lloyd Elliott.
- *Tom Minka's Lecture on EP
|
Sampling |
- Bishop: Chapter 11
- Handbook of Markov Chain Monte Carlo (website with sample chapters)
|