Yee Whye Teh : Research : Projects

Bayesian Methods for Machine Learning

Machine learning researchers often have to contend with issues of model selection and model fitting in the context of large complicated models and sparse data. The idea which I am pushing for in this project is that these can be nicely handled using Bayesian techniques.

Model selection is selecting, among a class of models each of which has finite capacity, the model of the right capacity. Nonparametric Bayesian modelling sidesteps model selection by simply using models of potentially unbounded (or infinite) capacity. Overfitting is avoided simply by the usual Bayesian approach of integrating out all parameters (perhaps using MCMC or variational methods).

On the other hand, hierarchical Bayesian modelling is the idea of using more elaborate priors that introduce dependencies among model parameters. This is opposed to the traditional simplistic approach of using independent priors. Dependencies are important because of the "sharing of statistical strength" among different parts of the model: the idea is that what one part learns from data is shared among other parts of the model through the prior dependencies. This sharing of statistical strength is important when we have sparse data since each part of the model "sees" only a very small amount of data so will not learn well without sharing.

Along with Mike Jordan, Matt Beal and other collegues, I combine both ideas together within a single framework, studying novel Bayesian models that are both nonparametric and hierarchical hence inherit advantages of both. We develop MCMC and variational inference schemes for such models and we apply them to a variety of applications.

Approximate Inference

Exact inference for many graphical models of interest is intractable. A variety of approximations are being actively studied, including MCMC samplers, variational methods and message-passing methods. Along with Max Welling and other collegues, I study the properties of variational and message-passing methods both theoretically and experimentally, relationships among them, and propose novel methods based on the insights gained.

Unsupervised Learning with Energy-based Models and Deep Architectures

Along with Geoff Hinton, Max Welling and others, I study a variety of models for visual and sensory perception in an unsupervised framework. The linchpin of all these models is the use of contrastive divergence, an approximate learning algorithm that works extremely well for such models. We were able to train deep belief networks in an almost fully unsupervised fashion that gave the current best known results for handwritten digit classification, if invariant image transformations are not explicitly taken into account. In the past we have also looked into models that are related to ICA and models for face recognition.

Computational Vision

I was involved in a project with David Forsyth's group while at Berkeley on vision problems that involve both images and textual data. In particular, we have looked at extracting and accurately labelling faces in news images by using caption information, and we have looked at the processing of images of ancient Latin text.

Computational Linguistics

Computational Biology

Computational Neuroscience

Miscellaneous

Theses

Course Projects