The Helmholtz Machine Through Time
Geoff Hinton   Peter Dayan   Ava To
  Radford Neal
In F Fogelman-Soulie & R Gallinari, editors, ICANN-95,,
483-490.
Abstract
We describe the ``wake-sleep'' algorithm that allows a multilayer,
unsupervised, stochastic neural network to build a hierarchical, top-down
generative model of an ensemble of data vectors. Because the generative model
uses distributed representations that are a non-linear function of the input,
it is intractable to compute the posterior probability distribution over
hidden representations given the generative model and the current data vector.
It is therefore intractable to fit the generative model to data using standard
techniques such as gradient descent or EM. Instead of computing the posterior
distribution exactly, a ``Helmholtz Machine'' uses a separate set of bottom-up
``recognition'' connections to produce a compact approximation to the
posterior distribution. The wake-sleep algorithm uses the top-down generative
connections to provide training data for the bottom-up recognition connections
and vice versa. In this paper, we show that the wake-sleep algorithm
can be generalized to model the temporal structure in sequences of data
vectors. This gives a very simple online algorithm that fits generative
models which have distributed hidden representations which can be
exponentially more powerful than conventional Hidden Markov Models.
compressed postscript   pdf