Yee Whye Teh : Research : Projects
Bayesian Methods for Machine Learning
Machine learning researchers often have to contend with issues of model selection and model fitting in the context of large complicated models and sparse data. The idea which I am pushing for in this project is that these can be nicely handled using Bayesian techniques.
Model selection is selecting, among a class of models each of which has finite capacity, the model of the right capacity. Nonparametric Bayesian modelling sidesteps model selection by simply using models of potentially unbounded (or infinite) capacity. Overfitting is avoided simply by the usual Bayesian approach of integrating out all parameters (perhaps using MCMC or variational methods).
On the other hand, hierarchical Bayesian modelling is the idea of using more elaborate priors that introduce dependencies among model parameters. This is opposed to the traditional simplistic approach of using independent priors. Dependencies are important because of the "sharing of statistical strength" among different parts of the model: the idea is that what one part learns from data is shared among other parts of the model through the prior dependencies. This sharing of statistical strength is important when we have sparse data since each part of the model "sees" only a very small amount of data so will not learn well without sharing.
Along with Mike Jordan, Matt Beal and other collegues, I combine both ideas together within a single framework, studying novel Bayesian models that are both nonparametric and hierarchical hence inherit advantages of both. We develop MCMC and variational inference schemes for such models and we apply them to a variety of applications.
Dirichlet Processes---MLSS 2007.
Y.W. Teh. Machine Learning Summer School 2007 Tutorial and Practical Course.
[slides.pdf] [all.tgz] [all.zip] [MLSS 2007] [Video Lectures]Dirichlet Processes.
Y.W. Teh. Encyclopedia of Machine Learning, under review.
[bibtex] [pdf] [djvu]Hierarchical Bayesian Nonparametric Models with Applications.
Y.W. Teh and M.I. Jordan. Bayesian Nonparametrics, to appear. Cambridge University Press.
[bibtex] [pdf] [djvu] [Cambridge University Press]Research Papers
A Stochastic Memoizer for Sequence Data.
F. Wood, C. Archambeau, J. Gasthaus, L. F. James and Y.W. Teh. ICML 2009.
[bibtex] [pdf] [ICML 2009]Variational Inference for the Indian Buffet Process.
F. Doshi, K. T. Miller, J. Van Gael and Y.W. Teh. AISTATS 2009.
[bibtex] -[pdf] [AISTATS 2009]Infinite Hierarchical Hidden Markov Models.
K. Heller, Y.W. Teh and D. Gorur. AISTATS 2009.
[bibtex] [pdf] [AISTATS 2009]A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation .
F. Wood and Y.W. Teh. AISTATS 2009.
[bibtex] [pdf] [AISTATS 2009]The Mondrian Process.
D.M. Roy and Y.W. Teh. NIPS 2008.
[bibtex] [pdf] [djvu] [NIPS 2008]An Efficient Sequential Monte-Carlo Algorithm for Coalescent Clustering.
D. Gorur and Y.W. Teh. NIPS 2008.
[bibtex] [pdf] [djvu] [NIPS 2008]The Infinite Factorial Hidden Markov Model.
J. Van Gael, Y.W. Teh and Z. Ghahramani. NIPS 2008.
[bibtex] [pdf] [djvu] [NIPS 2008]Dependent Dirichlet Process Spike Sorting.
J. Gasthaus, F. Wood, D. Gorur and Y.W. Teh. NIPS 2008.
[bibtex] [pdf] [djvu] [NIPS 2008]Beam Sampling for the Infinite Hidden Markov Model.
J. Van Gael, Y. Saatci, Y.W. Teh and Z. Ghahramani. ICML 2008.
[bibtex] [pdf] [djvu] [ICML 2008] [code] [presen tation]Bayesian Agglomerative Clustering with Coalescents.
Y.W. Teh, H. Daume III and D.M. Roy. NIPS 2007.
[bibtex] [pdf] [djvu] [NIPS 2007]Collapsed Variational Inference for HDP.
Y.W. Teh, K. Kurihara and M. Welling. NIPS 2007.
[bibtex] [pdf] [djvu] [NIPS 2007]Stick-breaking Construction for the Indian Buffet Process.
Y.W. Teh, D. Gorur and Z. Ghahramani. AISTATS 2007.
[bibtex] [pdf] [ps.gz] [djvu] [AISTATS 2007]Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture.
E.P. Xing, K.-A. Sohn, M.I. Jordan and Y.W. Teh. ICML 2006.
[bibtex] [pdf] [ps.gz] [djvu] [ICML 2006]A Hierarchical Bayesian Language Model based on Pitman-Yor Processes.
Y.W. Teh. Coling/ACL 2006.
[bibtex] [pdf] [ps.gz] [djvu] [Coling/ACL 2006]Long version: A Bayesian Interpretation of Interpolated Kneser-Ney.
Y.W. Teh. Technical Report TRA2/06, School of Computing, NUS, revised 2006.
[bibtex] [pdf] [ps.gz] [djvu] [School of Computing, NUS]Semiparametric Latent Factor Models.
Y.W. Teh, M. Seeger and M.I. Jordan. AISTATS 2005.
[bibtex] [pdf] [ps.gz] [djvu] [AISTATS 2005]Long version: Semiparametric Latent Factor Models.
M. Seeger, Y.W. Teh and M.I. Jordan. Technical Report, Computer Science, UC Berkeley, 2005.
[bibtex] [pdf] [ps.gz] [djvu] [Computer Science, UC Berkeley]Hierarchical Dirichlet Processes.
Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei. JASA 101(476):1566-1581, 2006.
[bibtex] [pdf] [ps.gz] [djvu] [JASA]Old version: Hierarchical Dirichlet Processes.
Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei. Technical Report 653, Statistics, UC Berkeley, 2004.
[bibtex] [pdf] [ps.gz] [djvu] [Statistics, UC Berkeley]Short version: Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes.
Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei. NIPS 2004.
[bibtex] [pdf] [ps.gz] [djvu] [NIPS 2004]Software
Nonparametric Bayesian Mixture Models - release 2.1.
Y.W. Teh. 2004. MATLAB and C code. Implements HDPs where you can have DPs arranged in any tree structure. Only for multinomials, and only on linux and Mac OS X.
[readme] [tgz]Nonparametric Bayesian Mixture Models - release 1.
Y.W. Teh. 2004. MATLAB Code. Has DP mixture, HDP mixture, LDA for Gaussians and multinomials.
[readme] [tgz]Miscellaneous
Workshop on Nonparametric Bayes.
Co-organized with Romain Thibaux, Athanasios Kottas, Zoubin Ghahramani, and Mike Jordan, ICML/UAI/COLT 2008 .
Introductions and Reviews
Approximate Inference
Exact inference for many graphical models of interest is intractable. A variety of approximations are being actively studied, including MCMC samplers, variational methods and message-passing methods. Along with Max Welling and other collegues, I study the properties of variational and message-passing methods both theoretically and experimentally, relationships among them, and propose novel methods based on the insights gained.
On Smoothing and Inference for Topic Models.
A. Asuncion, M. Welling, P. Smyth and Y.W. Teh. UAI 2009.
[bibtex] [pdf] [UAI 2009]Variational Inference for the Indian Buffet Process.
F. Doshi, K. T. Miller, J. Van Gael and Y.W. Teh. AISTATS 2009.
[bibtex] [AISTATS 2009]Hybrid Variational/Gibbs Inference in Topic Models.
Max Welling, Y.W. Teh and B. Kappen UAI 2008.
[bibtex] [pdf] [djvu] [UAI 2008]Collapsed Variational Inference for HDP.
Y.W. Teh, K. Kurihara and M. Welling. NIPS 2007.
[bibtex] [pdf] [djvu] [NIPS 2007]Cooled and Relaxed Survey Propagation for MRFs.
H.L. Chieu, W.S. Lee and Y.W. Teh. NIPS 2007.
[bibtex] [pdf] [djvu] [NIPS 2007] [proof.pdf] [proof.djvu]Collapsed Variational Dirichlet Process Mixture Models.
K. Kurihara, M. Welling and Y.W. Teh. IJCAI 2007.
[bibtex] [pdf] [ps.gz] [djvu] [IJCAI 2007]A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation.
Y.W. Teh, D. Newman and M. Welling. NIPS 2006.
[bibtex] [pdf] [ps.gz] [djvu] [NIPS 2006]Structured Region Graphs: Morphing EP into GBP.
M. Welling, T. Minka and Y.W. Teh. UAI 2005. Extended version with proofs.
[bibtex] [pdf] [ps.gz] [djvu] [UAI 2005]Approximate Inference by Markov Chains on Union Spaces.
M. Welling, M. Rosen-Zvi and Y.W. Teh. ICML 2004.
[bibtex] [pdf] [ps.gz] [djvu] [ICML 2004]Linear Response Algorithms for Approximate Inference in Graphical Models.
M. Welling and Y.W. Teh. Neural Computation 16:197-221, 2004.
[bibtex] [pdf] [ps.gz] [djvu] [Neural Computation]Short version: Linear Response Algorithms for Approximate Inference.
M. Welling and Y.W. Teh. NIPS 2003.
[bibtex] [pdf] [ps.gz] [djvu] [NIPS 2003]On Improving the Efficiency of the Iterative Proportional Fitting Procedure.
Y.W. Teh and M. Welling. AISTATS 2003.
[bibtex] [pdf] [ps.gz] [djvu] [AISTATS 2003]The Unified Propagation and Scaling Algorithm.
Y.W. Teh and M. Welling. NIPS 2001.
[bibtex] [pdf] [ps.gz] [djvu] [NIPS 2001]Passing and Bouncing Messages for Generalized Inference.
Y.W. Teh and M. Welling. Technical Report 2001-001, Gatsby Unit, UCL.
[bibtex] [pdf] [ps.gz] [djvu] [Gatsby Unit]Approximate Inference in Boltzmann Machines.
M. Welling and Y.W. Teh. Artificial Intelligence 143(1):19-50, 2003.
[bibtex] [pdf] [ps.gz] [djvu] [Artificial Intelligence]Short version: Belief Optimization for Binary Networks: A Stable Alternative to Loopy Belief Propagation.
M. Welling and Y.W. Teh. UAI 2001.
[bibtex] [pdf] [ps.gz] [djvu] [UAI 2001]Software
Easy BP - release 0.
Y.W. Teh. 2008. MATLAB and C code. Implements a MATLAB table class to make implementation of various message passing inference algorithms much simpler.
[readme] [tgz] [zip]
Research Papers
Unsupervised Learning with Energy-based Models and Deep Architectures
Along with Geoff Hinton, Max Welling and others, I study a variety of models for visual and sensory perception in an unsupervised framework. The linchpin of all these models is the use of contrastive divergence, an approximate learning algorithm that works extremely well for such models. We were able to train deep belief networks in an almost fully unsupervised fashion that gave the current best known results for handwritten digit classification, if invariant image transformations are not explicitly taken into account. In the past we have also looked into models that are related to ICA and models for face recognition.A Fast Learning Algorithm For Deep Belief Networks.
G.E. Hinton, S. Osindero and Y.W. Teh. Neural Computation 18(7):1527-1554, 2006.
[bibtex] [pdf] [ps.gz] [djvu] [Neural Computation]Unsupervised Discovery of Non-Linear Structure using Contrastive Backpropagation.
G.E. Hinton, S. Osindero, M. Welling and Y.W. Teh. Cognitive Science 30:4, 2006.
[bibtex] [pdf] [ps.gz] [djvu] [Cognitive Science]Energy-Based Models for Sparse Overcomplete Representations.
Y.W. Teh, M. Welling, S. Osindero and G.E. Hinton. JMLR 4(Dec):1235-1260, 2003.
[bibtex] [pdf] [ps.gz] [djvu]
[Journal of Machine Learning Research]Short version: A New View of ICA.
G.E. Hinton, M. Welling, Y.W. Teh and S. Osindero. ICA 2001.
[bibtex] [pdf] [ps.gz] [djvu] [ICA 2001]Discovering Multiple Constraints that are Frequently Approximately Satisfied.
G.E. Hinton and Y.W. Teh. UAI 2001.
[bibtex] [pdf] [ps.gz] [djvu] [UAI 2001]Rate-coded Restricted Boltzmann Machines for Face Recognition.
Y.W. Teh and G.E. Hinton. NIPS 2000.
[bibtex] [pdf] [ps.gz] [djvu] [NIPS 2000]
Computational Vision
I was involved in a project with David Forsyth's group while at Berkeley on vision problems that involve both images and textual data. In particular, we have looked at extracting and accurately labelling faces in news images by using caption information, and we have looked at the processing of images of ancient Latin text.Names and Faces.
T.L. Berg, A.C. Berg, J. Edwards, M. Maire, R. White, Y.W. Teh, E. Learned-Miller, D.A. Forsyth. Submitted.
[bibtex] [pdf]Making Latin Manuscripts Searchable using gHMM's.
J. Edwards, Y.W. Teh, D.A. Forsyth, M. Maire, R. Bock and G. Vesom. NIPS 2004.
[bibtex] [pdf] [ps.gz] [djvu] [NIPS 2004]Faces and Names in the News.
T. Miller, A.C. Berg, J. Edwards, M. Maire, R. White, Y.W. Teh, E. Learned-Miller, D.A. Forsyth. CVPR 2004.
[bibtex] [pdf] [djvu] [CVPR 2004]
Computational Linguistics
A Stochastic Memoizer for Sequence Data.
F. Wood, C. Archambeau, J. Gasthaus, L. F. James and Y.W. Teh. ICML 2009.
[bibtex] [pdf] [ICML 2009]A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation.
F. Wood and Y.W. Teh. AISTATS 2009.
[bibtex] [pdf] [AISTATS 2009]Hierarchical Dirichlet Trees for Information Retrieval.
G.R. Haffari and Y.W. Teh. NAACL-HLT 2009.
[bibtex] [pdf] [NAACL-HLT 2009]A Hierarchical Bayesian Language Model based on Pitman-Yor Processes.
Y.W. Teh. Coling/ACL 2006.
[bibtex] [pdf] [ps.gz] [djvu] [Coling/ACL 2006]Long version: A Bayesian Interpretation of Interpolated Kneser-Ney.
Y.W. Teh. Technical Report TRA2/06, School of Computing, NUS, revised 2006.
[bibtex] [pdf] [ps.gz] [djvu] [School of Computing, NUS]
Computational Biology
A Mixture Model for the Evolution of Gene Expression in Non-homogeneous Datasets.
G. Quon, Y.W. Teh, E. Chan, M. Brudno, T. Hughes and Q.D. Morris. NIPS 2008.
[bibtex] [pdf] [djvu] [NIPS 2008]Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture.
E.P. Xing, K.-A. Sohn, M.I. Jordan and Y.W. Teh. ICML 2006.
[bibtex] [pdf] [ps.gz] [djvu] [ICML 2006]
Computational Neuroscience
Dependent Dirichlet Process Spike Sorting.
J. Gasthaus, F. Wood, D. Gorur and Y.W. Teh. NIPS 2008.
[bibtex] [pdf] [djvu] [NIPS 2008]
Miscellaneous
Semi-supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances.
W.S. Lee, X. Zhang and Y.W. Teh. Technical Report TRB3/06, School of Computing, NUS, 2006.
[bibtex] [pdf] [ps.gz] [djvu] [School of Computing, NUS]Automatic Alignment of Local Representations.
Y.W. Teh and S. Roweis. NIPS 2002.
[bibtex] [pdf] [ps.gz] [djvu] [NIPS 2002]Locally Linear Coordination - release 1.
Y.W. Teh and S. Roweis. 2002. MATLAB code. Includes MFA/MPPCA code.
[readme] [tgz]An Alternate Objective Function for Markovian Fields.
S. Kakade, Y.W. Teh and S. Roweis. ICML 2002.
[bibtex] [pdf] [ps.gz] [djvu] [ICML 2002]Making Forward Chaining Relevant.
F. Bacchus and Y.W. Teh. AIPS 1998.
[bibtex] [pdf] [ps.gz] [djvu] [AIPS 1998]
Theses
Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models.
Y.W. Teh. Ph.D. Thesis, 2003. University of Toronto.
[bibtex] [pdf] [ps.gz] [djvu] [Computer Science, Toronto]Learning to Parse Images.
Y.W. Teh, 2000. Master's thesis, University of Toronto.
[bibtex] [pdf] [ps.gz] [djvu] [Computer Science, Toronto]Short version: Learning to Parse Images.
G.E. Hinton, Z. Ghahramani and Y.W. Teh. NIPS 1999.
[bibtex] [pdf] [ps.gz] [djvu] [NIPS 1999]
Course Projects
Incremental conservative visibility with general occluders.
Y.W. Teh and H. Zhang. CSC2522F Project, 1999.
[pdf] [ps.gz] [djvu]Wagner's conjecture.
Y.W. Teh, CSC2410S Project, 1999.
[pdf] [ps.gz] [djvu]An attention model and steerable filters.
Y.W. Teh. CSC2523S Project, 1999.
[pdf] [ps.gz] [djvu]Representing coastlines with linear transforms.
Y.W. Teh. CSC2508S Project, 2000.
[pdf] [ps.gz] [djvu]