Probabilistic and Unsupervised Learning - Additional material

You'll find here some additional material to help you understand the course. This page is modified from Balaji Lakshminarayanan's older version.

Please be warned that these are recommended reading.
For example, you shouldn't expect the assignments or exams to cover only the chapters in Bishop that we list here. The same is true for any item of this list, you are expected to find additional material about the lecture by yourself when needed.
* indicates materials that are frequently asked about and are relatively easy to understand than papers/books chapters

Topic	Additional reading
Matrix Foundations	You may find different derivative conventions in various references. In this course the denominator convention is used. Matrix Cookbook by K. Petersen and M. Pedersen Tom Minka's notes on matrix algebra in convention different from this course, but has more derivation examples Matrix Identities by Sam Roweis Ged Ridgway's derivations of Matrix Inversion Lemma from first principles.
Statistics Foundations	Cribsheet by Iain Murray Bishop: Chapter 1 and 2. In particular Chapter 2.3 for important Gaussian distribution properties* David MacKay's book: Chapters 1-3 and 22-23. Appendices B, C Nuances of Probability Theory by Tom Minka. Probability Theory: The Logic of Science by ET Jaynes Probability and Statistics Online Reference
Optimisation Foundations	*Bishop: Appendix E for simple introduction to Lagrange multiplier More indepth intro to Lagrange multiplier in Athur Gretton's kernel lectures (Section 2 of the linked notes)
Latent Variable Models	Bishop: Chapters 3 and 12. David MacKay's book: Chapters 20-22, 34. Max Welling's Class Notes on PCA and FA Andrew Ng's Class Notes on Generalized Linear Model (Part III p.22 onwards)
EM	Bishop: Chapter 9
Latent state-space models	Bishop: Chapter 13 Byron Yu's derivation of Kalman filtering and smoothing equations from first principles Minka, T. (1999) From Hidden Markov Models to Linear Dynamical Systems Welling notes on HMM and Kalman Filter Rabiner's tutorial on HMM Maneesh Sahani's paper on subspace identification SSID and an application to neural data(Section 2 for methods)
Graphical models	Bishop: Chapter 8. In particular: 8.2 directed graphs, 8.2.2 D-seperation 8.3 undirected graphs 8.4.3 factor graphs, 8.4.4 the sum-product algorithm (on a factor graph) MacKay: Chapter 16, 26 David Heckermann's Tutorial on Graphical Models Belief Propagation Yedidia, Freeman and Weiss: Understanding Belief Propagationand its Generalizations Comprehensive illustration of BP, relation to Bethe free energy and Generalized BP Murphy, Weiss and Jordan: Loopy Belief Propagation for Approximate Inference: An Empirical Study Frank Kschischang's tutorial paper on Factor Graphs and the Sum-Product Algorithm Zoubin Ghahramani's Class notes on Graphical models and message passing, a probablistic ranking example Frey and MacKay: Belief Propagation in Graphs with Cycles in error-correcting coding context Junction tree specific David Barber's lecture notes: a complete guide with examples Michael Jordan's lecture notes with a proof of the condition under which a clique tree is a junction tree Chris Williams' lecture notes with a proof of correctness of JTA Lepar and Shenoy: A comparison of Lauritzen-Speigelhalter, Hugin and Shenoy-Shafer architectures for computing mariginals of probability distributions
Bayesian model selection	Bishop: Chapter 4.4, 3.4-3.5 MacKay: Chapter 27-28
Gaussian Processes	Bishop: Chapter 6 MacKay: Chapter 45 Carl Rasmussen's tutorial slides on Gaussian Processes Richard Turner's lecture notes. Good material for understanding sparse GP and pointers for methods other than pseudo-data Quinonero-Candela and Rasmussen: A Unifying View of Sparse Approximate Gaussian Process Regression A great textbook about Gaussian Processes, available online: Gaussian processes for Machine Learning by Rasmussen and Williams
Expectation Propagation	Divergence measures and message passing (Minka 2005) Expectation propagation for exponential families (Seeger 2008) Graphical models, exponential families and variational inference (Wainwright, Jordan 2008) Notes on EP by Lloyd Elliott. *Tom Minka's Lecture on EP
Sampling	Bishop: Chapter 11 Handbook of Markov Chain Monte Carlo (website with sample chapters) Chapter 1: Intro to MCMC Chapter 5: Hamultonian MC

Maintained by Kevin Li: kevinli@gatsby.ucl.ac.uk