- Lecture notes
- Video tutorials: Tutorial talks available online as streaming videos.
- Further reading: References on various topics in Bayesian nonparametrics.
Lecture Notes
The first few chapters of these class notes provide a basic introduction to the Dirichlet process, Gaussian process, and to latent feature models. The remaining chapters cover more advanced material. The focus is on concepts; it is not a literature survey.-
Lecture Notes on Bayesian Nonparametrics.
P Orbanz.
[PDF (draft)]
Video tutorials
NIPS Tutorial
-
Modern Bayesian nonparametrics.
P Orbanz and YW Teh.
NIPS Tutorial, 2011.
[Slides] [Video (YouTube)]
Machine Learning Summer School 2012
-
Bayesian nonparametrics.
P Orbanz.
Machine Learning Summer School, 2012.
[Videolectures]
[Slides]Lecture 1: Clustering, Dirichlet processes, IBPs
[Slides]Lecture 2: Gaussian processes, model construction, exchangeability, asymptotics
Machine Learning Summer School 2009
At MLSS 2009, I gave two talks on the basics of measure theory and stochastic process concepts involved in Bayesian nonparametrics. They complemented the talks by Yee Whye Teh at the same Summer School, which I highly recommend.-
Nonparametric Bayesian Models.
YW Teh.
Machine Learning Summer School, 2009.
[Videolectures]
-
Theoretical Foundations of Nonparametric Bayesian Models.
P Orbanz.
Machine Learning Summer School, 2009.
[Videolectures]
-
Gaussian processes
CE Rasmussen.
Machine Learning Summer School, 2009.
[Videolectures]
Further Reading
Surveys
Yee Whye Teh and I have written a short introductory article:
-
Bayesian Nonparametric Models.
P Orbanz and YW Teh.
In Encyclopedia of Machine Learning (Springer), 2010.
[PDF]
-
Graphical Models for Visual Object Recognition and Tracking.
EB Sudderth.
PhD thesis, 2006.
[PDF]
-
A tutorial on Bayesian nonparametric models.
SJ Gershman and DM Blei.
Journal of Mathematical Psychology (56):1-12, 2012.
[PDF]
-
Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures.
P Orbanz and DM Roy.
IEEE Transactions on Pattern Analysis and Machine Intelligence (in press).
[arxiv 1312.7857]
-
Bayesian nonparametric inference for random distributions and related functions.
SG Walker, P Damien, PW Laud, and AFM Smith.
Journal of the Royal Statistical Society B, 61 (3):485-527, 1999.
[MathSciNet]
-
Bayesian Theory.
JM Bernardo and AFM Smith.
John Wiley & Sons, 1994.
[MathSciNet]
Random discrete measures
Random discrete measures include models such as the Dirichlet process and the Pitman-Yor process.
In applications, these models are typically used as priors on the mixing measure of a mixture model
(e.g. Dirichlet process mixtures).
Dirichlet and Pitman-Yor processes
A concise introduction to the Dirichlet process is:-
Dirichlet processes.
YW Teh.
In Encyclopedia of Machine Learning (Springer), 2010.
[PDF]
-
Markov chain sampling methods for Dirichlet process mixture models.
RM Neal.
Journal of Computational and Graphical Statistics, 9:249-265, 2000.
[MathSciNet]
-
Dirichlet process, related priors and posterior asymptotics.
S Ghosal.
In N. L. Hjort et al., editors, Bayesian Nonparametrics.
Cambridge University Press, 2010.
[MathSciNet]
-
Gibbs sampling methods for stick-breaking priors.
H Ishwaran and LF James.
Journal of the American Statistical Association, 96: 161-173, 2001.
[MathSciNet]
-
Size-biased sampling of Poisson point processes and excursions.
M Perman, J Pitman and M Yor.
Probability Theory and Related Fields, 25(92): 21-39, 1992.
[MathSciNet]
-
A Hierarchical Bayesian Language Model based on Pitman-Yor Processes.
YW Teh.
Coling/ACL 2006.
[PDF]
Generalizations
Dirichlet processes and Pitman-Yor processes are two examples of random discrete probabilities. Any random discrete probability measure can in principle be used to replace the Dirichlet process in mixture models or one of its other applications (infinite HMMs etc). Over the past few years, it has become much clearer which models exist, how they can be represented, and in which cases we can expect inference to be tractable. If you are interested in understanding how these models work and what the landscape of nonparametric Bayesian clustering models looks like, I recommend the following two articles:-
Models beyond the Dirichlet process.
A Lijoi and I Prünster.
In N. L. Hjort et al., editors, Bayesian Nonparametrics.
Cambridge University Press, 2010.
[MathSciNet] [PDF]
-
Conditional formulae for Gibbs-type exchangeable random partitions.
S Favaro, A Lijoi, and I Prünster.
Annals of Applied Probability, to appear.
[PDF]
-
Two tales about Bayesian nonparametric modeling.
I Prünster.
[Videolectures]
-
Projective limit random probabilities on Polish spaces.
P Orbanz.
Electronic Journal of Statistics, 5:1354-1373, 2011.
[PDF] [MathSciNet]
Point processes
Random discrete measures have natural representations as point processes. Basic knowledge of
point process makes it much easier to understand random measure models, and all more advanced
work on random discrete measures uses point process techniques. This is one of the topics on
which "the" book to read has been written; Kingman's book on the Poisson process is certainly
one of the best expository texts in probability.
-
Poisson Processes.
JFC Kingman.
Oxford University Press, 1993.
[MathSciNet]
-
An introduction to the theory of point processes.
D Daley and D Vere-Jones.
Springer, 2nd edition, 2008.
Volumes I [MathSciNet] and II [MathSciNet].
-
Bayesian Poisson process partition calculus with an application to Bayesian Lévy moving averages.
LF James.
Annals of Statistics, 33(4):1771-1799, 2005.
[MathSciNet]
-
Posterior analysis for normalized random measures with independent increments.
LF James, A Lijoi, and I Prünster.
Scandinavian Journal of Statistics, 36:76-97, 2009.
[MathSciNet]
Hierarchical and covariate-dependent models
One of the most popular models based on the Dirichlet process is the dependent Dirichlet process. Despite its great popularity, Steven MacEachern's original article on the model remains unpublished and is hard to find on the web. Steven has kindly given me permission to make it available here:-
Dependent Dirichlet processes.
SN MacEachern.
Technical report, Ohio State University, 2000.
[PDF]
-
Hierarchical Bayesian nonparametric models with applications.
YW Teh and MI Jordan.
In N. L. Hjort et al., editors, Bayesian Nonparametrics.
Cambridge University Press, 2010.
[MathSciNet]
-
Hierarchical Dirichlet processes.
YW Teh, MI Jordan, MJ Beal, and DM Blei.
Journal of the American Statistical Association, (476):1566-1581, 2006.
[MathSciNet]
Random functions
Distributions on random functions can be used as prior distributions in regression and related problems.
The prototypical prior on smooth random functions is the Gaussian process. An excellent introduction to
Gaussian process models and many references can be found in the monograph by Rasmussen and Williams.
-
Gaussian Processes for Machine Learning.
CE Rasmussen and CKI Williams.
MIT Press, 2006.
[PDF]
-
Random Fields and Geometry.
RJ Adler and JE Taylor.
Springer, 2007.
[MathSciNet]
Theory
A very good reference on abstract Bayesian methods, exchangeability, sufficiency,
and parametric models (including infinite-dimensional Bayesian models) are the first
two chapters of Schervish's Theory of Statistics.
-
Theory of Statistics.
MJ Schervish.
Springer, 1995.
[MathSciNet]
Posterior convergence
A clear and readable introduction to the questions studied in this area, and to how they are addressed, is a survey chapter by Ghosal which is referenced above. The following monograph is a good reference that provides many more details. Be aware though that the most interesting work in this area has arguably been done in the past decade, and hence is not covered by the book.-
Bayesian Nonparametrics.
JK Ghosh and RV Ramamoorthi.
Springer, 2002.
[MathSciNet]
-
Misspecification in infinite-dimensional Bayesian statistics.
BJK Kleijn and AW van der Vaart.
Annals of Statistics, 34(2):837-877, 2006.
[MathSciNet]
-
Posterior convergence rates of Dirichlet mixtures at smooth densities.
S Ghosal and AW van der Vaart.
Annals of Statistics, 35(2):697-723, 2007.
[MathSciNet]
-
Rates of contraction of posterior distributions based on Gaussian process priors.
AW van der Vaart and JH van Zanten.
Annals of Statistics, 36(3):1435-1463, 2008.
[MathSciNet]
-
A semiparametric Bernstein-von Mises theorem for Gaussian process priors.
I Castillo.
Probability Theory and Related Fields, 152:53-99, 2012.
[PDF]
Exchangeability
For a good introduction to exchangeability and its implications for Bayesian models, see Schervish's Theory of Statistics, which is referenced above. If you are interested in the bigger picture, and in how exchangeability generalizes to other random structures than exchangeable sequences, I highly recommend an article based on David Aldous' lecture at the International Congress of Mathematicians:-
Exchangeability and continuum limits of discrete random structures.
DJ Aldous.
In Proceedings of the International Congress of Mathematicians, 2010.
[PDF]
-
Probabilistic Symmetries and Invariance Principles.
O Kallenberg.
Springer, 2005.
[MathSciNet]
-
Nonparametric priors on complete separable metric spaces.
P Orbanz.
Preprint.
[PDF]
Urns and power laws
When the Dirichlet process was first developed, Blackwell and MacQueen realized that a sample from a DP can be generated by a so-called Pólya urn with infinitely many colors. Roughly speaking, an urn model assumes that balls of different colors are contained in an urn, and are drawn uniformly at random; the proportions of balls per color determine the probability of each color to be drawn. A specific urn is defined by a rule for how the number of balls is changed when a color is drawn. In Pólya urns, the number of balls of a color is increased whenever that color is drawn; this process is called reinforcement, and corresponds to the rich-get-richer property of the Dirichlet process. There are many different versions of Pólya urns, defined by different reinforcement rules.For Bayesian nonparametrics, urns provide a probabilistic tool to study the sizes of clusters in a clustering model, or more generally the weight distributions of random discrete measures. They also provide a link to population genetics, where urns model the distribution of species; you will sometimes encounter references to species sampling models. The relationship between the different terminologie is \[\begin{aligned} \text{colors in urn } = \text{ species } = \text{ clusters } \end{aligned} \] and \[\begin{aligned} \#\text{balls } = \#\text{individuals } = \text{ cluster size. } \end{aligned} \] A key property of Pólya urns is that they can generate power law distributions, which occur in applications such as language models or social networks.
If you are interested in urns and power laws, I recommend that you have a look at the following two survey articles (in this order):
Mathematical background
I am often asked for references on the mathematical foundations of Bayesian nonparametrics. There are a few specific reasons why Bayesian nonparametric models require more powerful mathematical tools than parametric ones; this is particularly true for theoretical problems.One of the reasons is that Bayesian nonparametric models do not usually have density representation, and hence require a certain amount of measure theory. Since the parameter space of a nonparametric model is infinite-dimensional, the prior and posterior distributions are probabilities on infinite-dimensional spaces, and hence stochastic processes. If you are interested in the theory of Bayesian nonparametrics and do not have a background in probability, you may have to familiarize yourself with some topics such as stochastic processes and regular conditional probabilities. These are covered in every textbook on probability theory. Billingsley's book is a popular choice.
-
Probability and Measure.
P Billingsley.
J. Wiley & Sons, 1995.
[MathSciNet]
-
Foundations of Modern Probability.
O Kallenberg.
Springer, 2nd edition, 2001.
[MathSciNet]
-
Infinite Dimensional Analysis.
CD Aliprantis and KC Border.
Springer, 3rd edition, 2006.
[MathSciNet]
This problem has motivated my own work on conjugate models (since conjugacy is the only reasonably general way we know to get from the prior and data to the posterior); see e.g.
-
Construction of Nonparametric Bayesian Models from Parametric Bayes Equations.
P Orbanz.
NIPS 2009.
[PDF]   [Supplements (Proofs)]
[Techreport Version] (Identical text; proofs included as appendix)
Historical references
The original DP paper is of course Ferguson's 1973 article. In his
acknowledgments, Ferguson attributes the idea to David Blackwell.
-
A Bayesian analysis of some nonparametric problems.
TS Ferguson.
Annals of Statistics, 1(2), 1973.
[MathSciNet]
-
Mixtures of Dirichlet processes with applications to Bayesian nonparametric estimation.
CE Antoniak.
Annals of Statistics, 2(6):1152-1174, 1974.
[MathSciNet]
-
On a class of Bayesian nonparametric estimates. I. Density estimates.
AY Lo.
Annals of Statistics, 12(1):351-357, 1984.
[MathSciNet]
-
Discreteness of Ferguson selections.
D Blackwell.
Annals of Statistics, 1(2):356-358, 1973.
[MathSciNet]
-
Ferguson distributions via Pólya urn schemes.
D Blackwell and JB MacQueen.
Annals of Statistics, 1(2):353-355, 1973.
[MathSciNet]
Consistency and posterior convergence
Until the 1980s, Bayesian statistics used a definition of consistency that is weaker than the modern definition. Roughly speaking, this definition states that the model has to behave well for all values of the parameter except for a set of zero probability under the prior. In parametric models, this set of exceptions does not usually cause problems, but in nonparametric models, it can make this notion of consistency almost meaningless. Work on stronger forms of consistency began after Diaconis and Freedman pointed out the problem by constructing a pathological counter example to consistent behavior of the Dirichlet process.-
On the consistency of Bayes estimates (with discussion).
P Diaconis and D Freedman.
Annals of Statistics, 14(1):1-67, 1986.
[MathSciNet]
-
Application of the theory of martingales.
JL Doob.
Coll. Int. du CNRS Paris. 1949.
[MathSciNet]
Exchangeability
Work on the equivalence of exchangeability and conditional independence dates back to several publications of de Finetti on sequences of binary random variables in the early 1930s, such as:-
Fuzione caratteristica di un fenomeno aleatorio.
B de Finetti.
Atti della R. Academia Nazionale dei Lincei, 4:251-299, 1931.
-
Symmetric measures on Cartesian products.
E Hewitt and LJ Savage.
Transactions of the American Mathematical Society, 80(2):470-501, 1955.
[MathSciNet]