13. How to maximize your reward rate

Mandana Ahmadi mandana@gatsby.ucl.ac.uk Peter E. Latham pel@gatsby.ucl.ac.uk

Gatsby Computational Neuroscience Unit, UCL, London, UK

When making decisions, there is always a tradeoff between speed and accuracy. Fast decisions have the potential for high reward rates (that is, more reward per unit time), but generally lead to lower accuracy. Slow decisions raise accuracy, but decrease reward rates. How do we find a happy medium? In general this is a hard question, especially in an uncertain world in which data is noisy, partially observed, and unreliable.

Here we address this question in a simplified task in which subjects have to decide whether a set of dots is moving to the right or left. Although simple, this task contains the key elements of almost all decision-making: the longer one stares at the dots, the more likely one is to be correct, but if one stares too long the reward rate becomes unacceptably low.

The most standard framework for analyzing this task is the diffusion-to-bound model(Palmer et al., 2005). Here we present a more rigorous framework based on dynamic programming. This allows us do several things that the diffusion-to-bound model can’t: determine optimal policies in the presence of nuisance parameters, compute full posterior probabilities over rewards, and explore how the prior probability that the dots are moving to the right affects behavior.

To apply this framework to the moving dot task, we generate spike trains that mimic those produced by motion sensitive brain area MT in response to moving dots, and calculate the optimal strategy of an ideal observer looking at those spike trains. Given the optimal strategy, we then ask two questions: do animals follow it in behavioral tasks, and does the next area after MT, LIP, properly integration the MT spikes?