Helmholtz Machines and Wake-Sleep Learning

Peter Dayan
Submitted to M Arbib, editor, Handbook of Brain Theory and Neural Networks, 2. Cambridge, MA: MIT Press.


Introduction

Unsupervised learning is largely concerned with finding structure among sets of input patterns such as visual scenes. One important example of structure comes in cases that the input patterns are generated or caused in a systematic way, for instance when objects with different shapes, surface properties and positions are lit by lights of different characters and viewed by an observer with a digital camera at a particular relative location. Here, the inputs can be seen as living on a manifold that has many fewer dimensions than the space of all possible activation patterns over the pixels of the camera, otherwise random visual noise in the camera would appear like a normal visual scene. The manifold should correctly be parameterized by the generators themselves (ie the objects, the lights, etc ) (see Hinton and Ghahramani, 1997).

The Helmholtz machine (Dayan et al , 1995) is an example of an approach to unsupervised learning called analysis by synthesis ( eg Neisser, 1967). Imagine that we have a perfect computer graphics model, which indicates how objects appear to observers. We can use this model to synthesize inputs patterns that look just like the input patterns the observer would normally receive, with the crucial difference that, since we synthesized them, we know in detail how the images were generated. We can then use these paired images and generators to train a model that analyses new images to find out how they were generated too, ie that represents them according to which particular generators underlie them. Conversely, if we have a perfect analysis model, which indicates the generators underlying any image, then it is straightforward to use the paired images and generators to train a graphics model. In the Helmholtz machine, we attempt to have an imperfect graphics or generative model train a better analysis or recognition model; and an imperfect recognition model train a better generative model.

There are three key issues for an analysis by synthesis model. First is the nature of the synthetic or generative model -- for the Helmholtz machine, this is a structured belief network (Jordan, 1998) that is a model for hierarchical top-down connections in the cortex. This model has an overall structure (the layers, units within a layer, etc), and a set of generative parameters, which determine the probability distribution it expresses. The units in the lowest layer of the network are observable, in the sense that it is on them that the inputs are presented; units higher up in the network are latent, since they are not directly observable from inputs. The second issue for an analysis by synthesis model is how new inputs are analyzed or recognized in the light of this generative model, ie how the states of the latent units are determined so that the input is represented in terms of the way that it would be generated by the generative model. For the Helmholtz machine, this is done in an approximate fashion using a second structured belief network (called the recognition model) over the latent units, whose parameters are also learned. The recognition network is a model for the standard, bottom-up, connections in cortex. The third issue is the way that the generative and recognition models are learned from data. For the wake-sleep learning algorithm for the stochastic Helmholtz machine (Hinton et al 1995), this happens in two phases. In the wake phase, the recognition model is used to estimate the underlying generators ( ie the states of the latent units) for a particular input pattern, and then the generative model is altered so that those generators are more likely to have produced the input that is actually observed. In the sleep phase, the generative model fantasizes inputs by choosing particular generators stochastically, and then the recognition model is altered so that it is more likely to report those particular generators, if the fantasized input were actually to be observed.


compressed postscript   pdf

back to:   top     publications