Dopamine Bonuses
Sham Kakade   &   Peter Dayan
NIPS 2000, 131-137.
Abstract
Substantial data support a temporal difference (TD) model of
dopamine (DA) neuron activity in which the cells provide a global
error signal for reinforcement learning. However, in certain
circumstances, DA activity seems anomalous under the TD model,
responding to non-rewarding stimuli. We address these anomalies by
suggesting that DA cells multiplex information about reward bonuses,
including Sutton's exploration bonuses and Ng {\it et al\/}'s
non-distorting shaping bonuses. We interpret this additional role
for DA in terms of the unconditional attentional and psychomotor
effects of dopamine, having the computational role of guiding
exploration.
compressed postscript
  pdf