Probabilistic and Unsupervised Learning - Additional material

You'll find here some additional material to help you understand the course. This page is modified from Balaji Lakshminarayanan's older version.

Please be warned that these are recommended reading.
For example, you shouldn't expect the assignments or exams to cover only the chapters in Bishop that we list here. The same is true for any item of this list, you are expected to find additional material about the lecture by yourself when needed.
* indicates materials that are frequently asked about and are relatively easy to understand than papers/books chapters

Topic Additional reading
Matrix Foundations You may find different derivative conventions in various references. In this course the denominator convention is used.
  1. *Matrix Cookbook by K. Petersen and M. Pedersen
  2. Tom Minka's notes on matrix algebra in convention different from this course, but has more derivation examples
  3. Matrix Identities by Sam Roweis
  4. * Ged Ridgway's derivations of Matrix Inversion Lemma from first principles.
Statistics Foundations
  1. *Cribsheet by Iain Murray
  2. Bishop: Chapter 1 and 2. In particular Chapter 2.3 for important Gaussian distribution properties
  3. David MacKay's book: Chapters 1-3 and 22-23. Appendices B, C
  4. Nuances of Probability Theory by Tom Minka.
  5. Probability Theory: The Logic of Science by ET Jaynes
  6. Probability and Statistics Online Reference
Optimisation Foundations
  1. *Bishop: Appendix E for simple introduction to Lagrange multiplier
  2. More indepth intro to Lagrange multiplier in Athur Gretton's kernel lectures (Section 2 of the linked notes)
Latent Variable Models
  1. Bishop: Chapters 3 and 12.
  2. David MacKay's book: Chapters 20-22, 34.
  3. Max Welling's Class Notes on PCA and FA
  4. Andrew Ng's Class Notes on Generalized Linear Model (Part III p.22 onwards)
EM
  1. Bishop: Chapter 9
Latent state-space models
  1. Bishop: Chapter 13
  2. Byron Yu's derivation of Kalman filtering and smoothing equations from first principles
  3. Minka, T. (1999) From Hidden Markov Models to Linear Dynamical Systems
  4. Welling notes on HMM and Kalman Filter
  5. Rabiner's tutorial on HMM
  6. Maneesh Sahani's paper on subspace identification SSID and an application to neural data(Section 2 for methods)
Graphical models
  1. Bishop: Chapter 8. In particular:
    • 8.2 directed graphs, 8.2.2 D-seperation
    • 8.3 undirected graphs
    • 8.4.3 factor graphs, 8.4.4 the sum-product algorithm (on a factor graph)
  2. MacKay: Chapter 16, 26
  3. David Heckermann's Tutorial on Graphical Models
  4. Belief Propagation
  5. Junction tree specific
Bayesian model selection
  1. Bishop: Chapter 4.4, 3.4-3.5
  2. MacKay: Chapter 27-28
Gaussian Processes
  1. Bishop: Chapter 6
  2. MacKay: Chapter 45
  3. *Carl Rasmussen's tutorial slides on Gaussian Processes
  4. *Richard Turner's lecture notes. Good material for understanding sparse GP and pointers for methods other than pseudo-data
  5. Quinonero-Candela and Rasmussen: A Unifying View of Sparse Approximate Gaussian Process Regression
  6. A great textbook about Gaussian Processes, available online: Gaussian processes for Machine Learning by Rasmussen and Williams
Expectation Propagation
  1. Divergence measures and message passing (Minka 2005)
  2. Expectation propagation for exponential families (Seeger 2008)
  3. Graphical models, exponential families and variational inference (Wainwright, Jordan 2008)
  4. Notes on EP by Lloyd Elliott.
  5. *Tom Minka's Lecture on EP
Sampling
  1. Bishop: Chapter 11
  2. Handbook of Markov Chain Monte Carlo (website with sample chapters)

Maintained by Kevin Li: kevinli@gatsby.ucl.ac.uk