Dopamine Bonuses
Sham Kakade & Peter Dayan
Submitted to Neural Networks
Abstract
In the temporal difference model of primate dopamine
neurons, their phasic activity reports a prediction error for future
reward. This model is supported by a wealth of experimental data.
However, in certain circumstances, the activity of the dopamine
cells seems anomalous under the model, as they respond in particular
ways to stimuli that are not obviously related to predictions of
reward. In this paper, we address two important sets of anomalies,
those having to do with generalization and novelty. Generalization
responses are treated as the natural consequence of partial
information; novelty responses are treated by the suggestion that
dopamine cells multiplex information about reward bonuses,
including exploration bonuses and shaping bonuses. We interpret this
additional role for dopamine in terms of the mechanistic attentional
and psychomotor effects of dopamine, having the computational role
of guiding exploration.
pdf