Exploration Bonuses and Dual Control
Peter Dayan   Terry Sejnowski
Machine Learning, 25, 5-22.
Abstract
Finding the Bayesian balance between exploration and
exploitation in adaptive optimal control is in general
intractable. This paper shows how to compute suboptimal estimates
based on a certainty equivalence approximation (Cozzolino,
Gonzalez-Zubieta & Miller, 1965) arising from a form of dual
control. This systematizes and extends existing uses of exploration
bonuses in reinforcement learning (Sutton, 1990). The approach has
two components: a statistical model of uncertainty in the world and a
way of turning this into exploratory behavior. This general approach
is applied to two-dimensional mazes with moveable barriers and its
performance is compared with Sutton's DYNA system.
compressed postscript   pdf