### 13. How to maximize your reward rate

Mandana Ahmadi^{} mandana@gatsby.ucl.ac.uk
Peter E. Latham^{} pel@gatsby.ucl.ac.uk
^{}Gatsby Computational Neuroscience Unit, UCL, London, UK

When making decisions, there is always a tradeoff between speed and accuracy. Fast decisions have the potential
for high reward rates (that is, more reward per unit time), but generally lead to lower accuracy. Slow decisions
raise accuracy, but decrease reward rates. How do we find a happy medium? In general this is
a hard question, especially in an uncertain world in which data is noisy, partially observed, and
unreliable.

Here we address this question in a simplified task in which subjects have to decide whether a set of dots is
moving to the right or left. Although simple, this task contains the key elements of almost all decision-making:
the longer one stares at the dots, the more likely one is to be correct, but if one stares too long the reward rate
becomes unacceptably low.

The most standard framework for analyzing this task is the diffusion-to-bound model(Palmer et al., 2005). Here
we present a more rigorous framework based on dynamic programming. This allows us do several things that the
diffusion-to-bound model can’t: determine optimal policies in the presence of nuisance parameters, compute full
posterior probabilities over rewards, and explore how the prior probability that the dots are moving to the right
affects behavior.

To apply this framework to the moving dot task, we generate spike trains that mimic those produced by motion
sensitive brain area MT in response to moving dots, and calculate the optimal strategy of an ideal observer
looking at those spike trains. Given the optimal strategy, we then ask two questions: do animals follow it
in behavioral tasks, and does the next area after MT, LIP, properly integration the MT spikes?