10th July 2007 — The Infinite PCFG using Hierarchical Dirichlet Processes

Yee Whye will discuss:

P. Liang, S. Petrov, M. Jordan and D. Klein, The Infinite PCFG using Hierarchical Dirichlet Processes, EMNLP 2007


We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data is available. In addition to presenting a fully Bayesian model for the PCFG, we also develop an efficient variational inference procedure. On synthetic data, we recover the correct grammar without having to specify its complexity in advance. We also show that our techniques can be applied to full-scale parsing applications by demonstrating its effectiveness in learning state-split grammars.