Reinforcement Comparison
Peter Dayan
In DS Touretzky, JL Elman, TJ Sejnowski & GE Hinton, editors,
Proceedings of the 1990 Connectionist Models Summer School. San
Mateo, CA: Morgan Kaufmann, 45-51.
Abstract
Sutton [in his PhD thesis] introduced a reinforcement comparison term
into the equations governing certain stochastic learning automata,
arguing that it should speed up learning, particularly for unbalanced
reinforcement tasks. Williams's subsequent extensions [REINFORCE] to
the class of algorithms demonstrated that they were all performing
approximate stochastic gradient ascent, but that, in terms of
expectations, the comparison term has no first order effect.
This paper analyses the second order contribution, and uses the
criterion that its modulus should be minimised to determine an optimal
value for the comparison term. This value turns out to be different
from the one Sutton used, and simulations suggest at its efficacy.
pdf