 
Using Free Energies to Represent Qvalues in a Multiagent
Reinforcement Learning Task
Brian Sallans
Department of Computer Science
University of Toronto
Toronto M5S 2Z9 Canada
Geoffrey Hinton
Gatsby Computational Neuroscience Unit
University College London
17 Queen Square, London WC1N 3AR, UK
Abstract
The problem of reinforcement learning in large factored Markov
decision processes is explored. The Qvalue of a stateaction pair is approximated
by the free energy of a product of experts network. Network parameters are learned
online using a modified SARSA algorithm which minimizes the inconsistency of the Qvalues
of consecutive stateaction pairs. Actions are chosen based on the current value
estimates by fixing the current state and sampling actions from the network using Gibbs
sampling. The algorithm is tested on a cooperative multiagent task. The
product of experts model is found to perform comparably to tablebased Qlearning for
small instances of the task, and continues to perform well when the problem becomes too
large for a tablebased representation.
Download [pdf] [ps.gz]
