^{}Gatsby Computational Neuroscience Unit, UCL, London, UK

As William James famously said, “everyone knows what attention is”, and indeed we all have introspective access to the effects (and frustrations) of attentional selection. But attempts to delineate exactly why the brain needs to selectively filter incoming information, and what the mechanisms and effects of selection are, have floundered in a sea of heterogenous effects. We have previously proposed a new probabilistic computational framework that unifies a number of attentional effects under a single normative description of the resource that is limited, why it is limited, and how attention helps [1].

Our framework is grounded in the Helmholtzian notion that perception requires an inverse inference from neural firing to the features in the world that caused it. Multiple sources of noise and ill-posedness make this inference poorly constrained, and the optimal approach is to compute posterior belief distributions over features according to Bayes rule. There is much evidence for Bayesian optimality in tasks involving a single object, but in cluttered natural scenes the resources required to represent the full posterior grow exponentially in the number of correlated features. We therefore suggest that a fundamental, computational resource limitation is the ability to represent joint distributions over large numbers of features, and that the brain makes approximations that neglect some of the correlations. Attention then consists of a ‘hypothesis’ that takes the mathematical form of an extra prior, but can be thought of as making a proposal about which location, or feature value, is of interest. The brain approximates the product of the true posterior and this extra attentional hypothesis, such that the approximation is more accurate in the proposed region. The attentional hypothesis can be driven by top-down cues or bottom-up salience computations, but also dynamically evolves towards a better match between itself and the true posterior as measured by an approximated partition function. This dynamic evolution towards ‘true’ proposals allows attention to reveal correlations not explicitly represented in the approximation, as it settles on co-occuring feature values that have a high correlation in the true posterior.

Here we illustrate this framework by implementing analogues of three top-down attentional paradigms in a simple model, which consists of an array of feature maps connected to an output layer via a local weight matrix. Noisy observations are drawn from this generative model, and we compute the posterior belief both with the appropriate attentional hypothesis and without. Performance on pre-cueing, task-driven bias, and binding tasks is simulated by mapping the posterior to a perceptual decision, and reveals the predicted attentional benefits. For pre-cueing, this approach has much in common with previous treatments of attention as a prior over locations of interest [2]. However, it also encompasses paradigms that are semantically or technically unsuited to such a treatment, and explicitly considers the case of multiple objects and thus the binding problem. We also consider possible anatomical implications of the framework.

References

[1] A unifying probabilistic computational framework for attention. M. Sahani and L. Whiteley, Cosyne,
Abstract III-3, 2007.

[2] Statistical Models and Sensory Attention. P. Dayan and R. Zemel, ICANN 1017-1022, 1999.