Data limitations on information estimates in large neuronal populations
Bruno B. Averbeck
Sobell Institute of Motor Neuroscience and Movement Disorders, Institute of Neurology, University College London

Understanding the role of correlations among neurons in information coding is a complex problem composed of several more specific questions. We are, however, beginning to obtain answers to certain of the questions. For example, we now understand how several different measures that have been developed to assess the effects of correlations are related. Further, analysis of experimental data is beginning to demonstrate rather clearly that correlations play a limited role in information coding at the level of pairs of neurons. In an effort to move beyond pairs, theoretical studies have attempted to address the question of whether or not correlations have an effect at the population level. These studies have generally proceeded by assuming an empirically informed structure for correlations, and then using this structure to estimate information as a function of population size. These theoretical studies have shown two related facts about the effects of correlations on populations of neurons. First, effects of correlations tend to get larger as the size of the population grows. Second, small effects of correlations in pairs of neurons do not imply small effects at the population level. This implies that the empirical results that have been obtained in pairs cannot be directly extrapolated to populations. Thus, the effects of correlations at the population level will have to be assessed directly at the level of the population.

Assessing information in large populations of neurons, however, raises an additional problem, which is less important in pairs of neurons. Namely, the problem of obtaining accurate information estimates in large populations. The question I will address is, how much data is necessary to show that there is additional information in signal dependent correlations, in a large population?

We have derived analytical results which answer this question for several estimators, including a naive estimator, a regularized Bayesian estimator, and an estimator based on boosting. If we consider the specific problem of assessing the effect of signal-dependent correlations, the problem depends on several factors including the number of terms in a model that can extract information from signal dependent correlations, the number of terms in the corresponding model that can only extract linear information, the variance of the quadratic model and the variance of the linear model, which is always greater than or equal to the variance of the quadratic model. We have found that the number of trials, necessary to show that there is information in signal dependent correlations using cross validation is quadratic in the number of predictor variables. This result can be combined with a model of the effect of signal dependent correlations developed by Shamir and Sompolinsky [1] which provides us with estimates of how the variance of the quadratic model is related to the variance of the linear model as a function of the number of neurons in the population. Using this we find that we would need approximately 2000 trials to show an effect with 30 neurons recorded simultaneously. Thus, the data requirements of answering this question are large.

1. M. Shamir and H. Sompolinsky Neural Comput. 16: 1105-1136 (2004).