13th June 2005 — Yee-Whye Teh: Bayesian language modelling

Yee-Whye Teh will give a talk:

A Bayesian Approach to Statistical Language Modelling

Statistical language modelling is a difficult task because of the large number of parameters and the sparsity of natural language data. We propose a Bayesian approach to tackling this problem. A higher level prior is defined which introduces dependencies among parameters, allowing statistical strength to be shared across parameters. In particular, each word is embedded in a low dimensional continuous space so that parameters corresponding to words with similar embeddings will be more similar to each other. A novel and efficient variational inference procedure is developed for the model. We report promising results on a small dataset.