Temporal difference models and reward-related
learning in the human brain
John
O'Doherty et al
Neuron 38,
329-337.
Abstract
Temporal
difference learning has been proposed as a model for Pavlovian
conditioning, in which an animal learns to predict delivery of
reward following presentation of a conditioned stimulus (CS). A
key component of this model is a prediction error signal, which,
before learning, responds at the time of presentation of reward but, after
learning, shifts its response to the time of onset of the CS. In
order to test for regions manifesting this signal profile, subjects
were scanned using event-related fMRI while undergoing appetitive
conditioning with a pleasant taste reward. Regression analyses
revealed that responses in ventral striatum and orbitofrontal
cortex were significantly correlated with
this error
signal, suggesting that, during appetitive conditioning, computations described
by temporal difference learning are expressed in the human
brain.
pdf