16th January 2007 — Estimation of Non-Normalized Statistical Models
Maneesh will discuss:
This paper introduces a new way to estimate parameters for an unnormalised density model (or, more precisely, a model with intractable partition function). The idea is reminiscent of contrastive divergence (CD): the gradient of the model density (in data space) should be aligned with that of the data. However, rather than attempting to approximate maximum likelihood learning as in CD, Aapo writes down an explicit cost function in the gradients, and shows that the argmin of this cost is a (locally) consistent estimator. A neat application of integration by parts eliminates the observed gradient from the cost, leading to an expression in the gradients and hessians of the log density, evaluated at the observed data points.
Rich has volunteered to be on hand to review CD if anyone needs/wants it ...