Topic  Additional reading 
Matrix Foundations 
You may find different derivative conventions in various references. In this course the denominator convention is used.
 *Matrix Cookbook by K. Petersen and M. Pedersen
 Tom Minka's notes on matrix algebra in convention different from this course, but has more derivation examples
 Matrix Identities by Sam Roweis
 * Ged Ridgway's derivations of Matrix Inversion Lemma from first principles.

Statistics Foundations 
 *Cribsheet by Iain Murray
 Bishop: Chapter 1 and 2. In particular Chapter 2.3 for important Gaussian distribution properties
 David MacKay's book: Chapters 13 and 2223. Appendices B, C

Nuances of Probability Theory by Tom Minka.
 Probability Theory: The Logic of Science by ET
Jaynes
 Probability and Statistics Online Reference

Optimisation Foundations 
 *Bishop: Appendix E for simple introduction to Lagrange multiplier
 More indepth intro to Lagrange multiplier in Athur Gretton's kernel lectures (Section 2 of the linked notes)

Latent Variable Models 
 Bishop: Chapters 3 and 12.
 David MacKay's book: Chapters 2022, 34.
 Max Welling's Class Notes on PCA and FA
 Andrew Ng's Class Notes on Generalized Linear Model (Part III p.22 onwards)

EM 
 Bishop: Chapter 9

Latent statespace models 
 Bishop: Chapter 13
 Byron Yu's derivation of Kalman filtering and smoothing equations from first principles
 Minka, T. (1999) From Hidden Markov Models to Linear Dynamical Systems
 Welling notes on HMM and Kalman Filter
 Rabiner's tutorial on HMM
 Maneesh Sahani's paper on subspace identification SSID and an application to neural data(Section 2 for methods)

Graphical models 
 Bishop: Chapter 8. In particular:
 8.2 directed graphs, 8.2.2 Dseperation
 8.3 undirected graphs
 8.4.3 factor graphs, 8.4.4 the sumproduct algorithm (on a factor graph)
 MacKay: Chapter 16, 26
 David Heckermann's Tutorial on Graphical Models
 Belief Propagation
 Junction tree specific

Bayesian model selection 
 Bishop: Chapter 4.4, 3.43.5
 MacKay: Chapter 2728

Gaussian Processes 
 Bishop: Chapter 6
 MacKay: Chapter 45
 *Carl Rasmussen's tutorial slides on Gaussian Processes
 *Richard Turner's lecture notes. Good material for understanding sparse GP and pointers for methods other than pseudodata
 QuinoneroCandela and Rasmussen: A Unifying View of Sparse Approximate Gaussian Process Regression
 A great textbook about Gaussian Processes, available online: Gaussian processes for Machine Learning by Rasmussen and Williams

Expectation Propagation 
 Divergence measures and message passing (Minka 2005)
 Expectation propagation for exponential families (Seeger 2008)
 Graphical models, exponential families and variational inference
(Wainwright, Jordan 2008)
 Notes on EP by Lloyd Elliott.
 *Tom Minka's Lecture on EP

Sampling 
 Bishop: Chapter 11
 Handbook of Markov Chain Monte Carlo (website with sample chapters)
