| |
Learning Factored Representations for
Partially Observable Markov Decision Processes
Brian Sallans
Gatsby Computational Neuroscience Unit
University College London
and
Department of Computer Science
University of Toronto
In Advances in Neural Information Processing Systems 12, MIT Press, Cambridge, MA
Abstract
The problem of reinforcement learning in a non-Markov environment is
explored using a dynamic Bayesian network, where conditional independence assumptions
between random variables are compactly represented by network parameters. The
parameters are learned on-line, and approximations are used to perform inference and to
compute the optimal value function. The relative effects of inference and value
function approximations on the quality of the final policy are investigated, by learning
to solve a moderately difficult driving task. The two value function approximations,
linear and quadratic, were found to perform similarly, but the quadratic model was more
sensitive to initialization. Both performed below the level of human performance on
the task. The dynamic Bayesian network performed comparably to a model using a
localist hidden state representation, while requiring exponentially fewer parameters.
Download [ps] [pdf]
|