Reinforcement Comparison 
 
Peter Dayan 
  
In DS Touretzky, JL Elman, TJ Sejnowski & GE Hinton, editors, 
Proceedings of the 1990 Connectionist Models Summer School. San
Mateo, CA: Morgan Kaufmann, 45-51.
 Abstract 
Sutton [in his PhD thesis] introduced a reinforcement comparison term
into the equations governing certain stochastic learning automata,
arguing  that it should speed up learning, particularly for unbalanced
reinforcement tasks. Williams's subsequent extensions [REINFORCE] to
the class of  algorithms demonstrated that they were all performing
approximate stochastic gradient ascent, but that, in terms of
expectations, the comparison term has no first order effect.
This paper analyses the second order contribution, and uses the
criterion that its modulus should be minimised to determine an optimal
value for the comparison term. This value turns out to be different
from the one Sutton used, and simulations suggest at its efficacy.
 pdf