It is possible to combine multiple probabilistic models of the same
  data by multiplying the probabilities together and then renormalizing. This is a very
  efficient way to model high-dimensional data which simultaneously satisfies many different
  low-dimensional constraints. Each individual expert model can focus on giving high
  probability to data vectors that satisfy just one of the constraints. Data vectors that
  satisfy this one constraint but violate other constraints will be ruled out by their low
  probability under the other expert models. Training a product of models appears difficult
  because, in addition to maximizing the probabilities that the individual models assign to
  the observed data, it is necessary to make the models disagree on unobserved regions of
  the data space: It is fine for one model to assign a high probability to an unobserved
  region as long as some other model assigns it a very low probability. Fortunately, if the
  individual models are tractable there is a fairly efficient way to train a product of
  models. This training algorithm suggests a biologically plausible way of learning neural
  population codes.
  
  Download:  ps or pdf