home - publications - software - MDP email address

Beyond Simple Cells:
Probabilistic models for visual cortical processing

Workshop held at NIPS 2007, Whistler, December 7

Talk abstracts - Poster abstracts - Back to the workshop page

Talk abstracts

Building and testing multi-stage models for cortical processing
E. Simoncelli

I'll describe our recent coordinated efforts to build and test models for cortical visual processing. Specifically, I hope to cover 1) efficient coding (as opposed to sparsity) as a basis for cortical processing; 2) simple parametric models for cortical regions as computational building blocks; and 3) experimental model characterization and validation with customized visual stimuli.


Natural image statistics and contextual visual processing
O. Schwartz

Contextual stimuli exert a dramatic influence on neural processing and perception. We focus on contextual surround and orientation as a paradigmatic example. We discuss contextual effects in terms of a generative probabilistic model of natural image statistics known as the Gaussian Scale Mixture (GSM). In the GSM model, filter responses are generated by Gaussian variables (modeling local filter structure), multiplied by mixer variables (capturing coordination between filter responses). For this class of model, one can learn patterns of dependence between oriented filter responses for a given image ensemble, and thereby induce a hierarchical representation of the inputs.

We consider the recognition process of extracting local structure, which is closely related to cortical divisive gain control models. Filters associated with surround stimuli participate in normalizing the activities of the filters representing a target stimulus, thereby changing the tuning curves. Through standard population decoding, we show that these changes lead to the forms of repulsion and attraction observed in the tilt illusion. Finally, we discuss preliminary directions for testing classes of generative models in cortical experiments, by adapting neurons to stimuli that are synthetically generated according to the statistics.


Learning to generalize over regions of natural images
M. Lewicki

Our visual system encodes complex natural edges, contours, and textures, whose retinal image is inherently highly variable. Essential to accurate perception is the formation of abstract representations that remain invariant across individual fixations. A common view is that this is achieved by neurons that signal the conjunctions of image features, but how these subserve invariant representations is poorly understood. I this talk I will discuss an approach that is based on learning statistical distributions of local regions in a visual scene. The central hypothesis is learning these local distributions allows the visual system to generalize across similar images.

I will present a model in which the joint activity of neurons encodes the probability distribution over their inputs and forms stable representations across complex patterns of variation. Trained on natural images, the model learns a compact set of functions that act as dictionary elements for image distributions typically encountered in natural scenes. Neurons in the model exhibit a wide range of properties observed in cortical neurons. These results provide a novel functional explanation for non-linear effects in complex cells in the primary visual cortex (V1) and make predictions about coding in higher visual areas, such as V2 and V4.

This is joint work with Yan Karklin.


Can Markov random fields tell us anything about visual receptive fields?
M. Black

We recently proposed a high-order Markov random field (MRF) model of natural images called a "Field of Experts" (FoE). The FoE defines local clique potentials as a product of experts where each expert is a non-linear function of a linear filter response. We learned the parameters of the model, including the linear filters, using a database of natural images. We found something surprising. In this high-order MRF, the optimal linear filters have an unusual high-frequency structure. In particular, the filters do not look at all like the kinds of Gabor-like filters found with other formulations. If the brain somehow has a statistical model of natural scenes, why have experimentalists never observed linear receptive fields that look like the filters we find? I will propose an answer that will be at the same time trivial and controversial. In particular, I'll argue that current experimental evidence cannot rule out the existence of FoE-like receptive fields in V1 (at this point in the talk most people will either be mad at me or think I'm crazy).


Self-taught learning via unsupervised discovery of structure
A. Ng

In this talk, I will describe ``self-taught learning,'' a new machine learning framework for using unlabeled data in classification tasks. Our approach is based on applying neuroscience-informed algorithms to discover structure from unlabeled data.

In self-taught learning, we are given labeled and unlabeled data, but do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text) downloaded randomly from the Internet to improve performance on a given image (or audio, or text) classification task. In our approach, we apply algorithms such as sparse coding (Olshausen and Field, 1996)--which was originally proposed as a model for some of the computations in the cortex--or deep belief networks (Hinton et al., 2006) to automatically learn ``succinct'' representations of the inputs based on the unlabeled data. Using these representations, by then applying a second layer of supervised learning addressing the specific task of interest, I show that we are able to obtain significantly improved performance on a large range of tasks.

Unsupervised learning algorithms for finding succinct representations lie at the heart of our approach to self-taught learning; and, biology offers a natural guide for developing better unsupervised learning algorithms. Indeed, motivated in part by the hierarchical organization of cortex, a number of algorithms have recently been proposed that try to learn hierarchical structure from unlabeled data, and many authors have compared such algorithms to computations performed in visual area V1 (and the cochlea). In this talk, I will also describe a variant of an unsupervised learning algorithm for learning succinct representations, one derived as a sparse version of deep belief networks, and a quantitative comparison of this model's to some of the properties of visual area V2, as well as the application of these ideas to self-taught learning.

This talk represents joint work with Rajat Raina, Honglak Lee, Roger Grosse, Alexis Battle, Chaitanya Ekanadham, Helen Kwong, and Ben Packer.


What the other 85% of V1 is doing
B. Olshausen
(download slides.pdf)

Despite the popularity of simple and complex cell models of V1, there are still many mysteries that remain unaccounted for. Here I shall speculate on what the unexplained part is doing based on functional considerations. In particular, I shall argue that figure-ground assignment, and extracting invariances in a manner that preserves relative spatial relationships, are important problems of vision that are likely dealt with in V1.


Poster abstracts

A Bayesian approach to inferring functional connectivity and structure from spikes
I. Stevenson, J. Rebesco, N. Hatsopoulos, L. Miller, K. Kording

Current multi-electrode techniques enable the simultaneous recording of spikes from hundreds of neurons. To study neural processing, plasticity, and hierarchical structure, such as in the visuomotor system, it is desirable to infer the underlying functional connectivity between the recorded neurons. One challenge in inferring these connections is that a large number of parameters, which characterize how each neuron influences the other neurons, are often estimated from relatively little data. Such problems can be improved by using Bayesian methods that combine information from the recorded spikes (likelihood) with prior beliefs about functional connectivity (prior). [...]
Extended pdf abstract


Classification of the Local Symmetry of an image using a Gaussian Derivative Filter Bank
L. Griffin

Measurement of an image by linear filters is known to be a useful first step for machine vision, and is a good model of simple cells in V1. What the optimal family of filters is has been much studied. However, whatever family is used, the measurements fail fully to determine the image that they measure. In other words spatial vision, like colour vision, suffers inevitably from metamerism. There is thus a puzzle as to what aspects of the image are actually determined by filter measurement. As a possible answer, we have considered whether and when filters are able to detect if an image has a particular symmetry. [...]
Extended pdf abstract


Building a hierarchical causal model of natural movies with spiking neurons
S. Deneve, T. Lochmann, U. Ernst

[...] In this work, we attempt to [frame] the problem of dynamic vision in time with a simplistic, tractable generative model, where approximate inference and learning can be performed by neurons integrating evidence over time and signalling with spikes.

[...] This model provides a functional interpretation for the context sensitivity of the classical receptive field of V1 simple cells. Furthermore, it provides a link to empirical data by suggesting an explicit biophysical implementation in terms of shunting inhibition. Since this provides a tractable, non-linear, dynamic generative model of the visual input and incorporates -Yˇslow featuresˇ in its Markovian statistics, it may help to discover essential higher order statistics in natural movies.
Extended pdf abstract


Topological Structure of Population Activity in Primary Visual Cortex
G. Singh, F. Memoli, T. Ishkhanov, G. Carlsson, G. Sapiro, and D. Ringach

Information in the cortex is widely believed to be represented by the joint activity of neuronal populations. Developing insights into the nature of these representations is a necessary first step in our quest to understanding cortical computation. Here, we show that fundamental questions about neural representation can be cast in terms of the topological structure of population activity. A new method, based on the concept of persistent homology, is introduced and validated on articial datasets. The technique is then applied to study the topological structure of neural activity in cell populations of primary visual cortex that were either spontaneously active or driven by natural image sequences. Our analyses conrm that spontaneous activity is highly structured and statistically dierent from noise. Furthermore, the topological objects derived from spontaneous and driven activity have similar distributions which are dominated by the topology of a circle and the two-sphere. This latter structure, we postulate, corresponds to the representation of orientation and spatial frequency on a spherical surface. Our findings shed new light on the relationship between ongoing and driven activity in primary visual cortex and demonstrates, for the first time, that computational topology offers novel tools to tackle fundamental questions about the representation of information in the nervous system.


Modeling Natural Sounds with Modulation Cascade Processes
R. Turner, M. Sahani

Natural sounds are structured on many time-scales. A typical segment of speech, for example, contains features that span four orders of magnitude: Sentences (1 s); phonemes (10^-1 s); glottal pulses (10^-2 s); and formants (10-3 s). The auditory system uses information from each of these time-scales to solve complicated tasks such as auditory scene analysis [1]. One route toward understanding how auditory processing accomplishes this analysis is to build neuroscienceinspired algorithms which solve similar tasks and to compare the properties of these algorithms with properties of auditory processing. There is however a discord: Current machine-audition algorithms largely concentrate on the shorter time-scale structures in sounds, and the longer structures are ignored. The reason for this is two-fold. Firstly, it is a difficult technical problem to construct an algorithm that utilises both sorts of information. Secondly, it is computationally demanding to simultaneously process data both at high resolution (to extract short temporal information) and for long duration (to extract long temporal information). The contribution of this work is to develop a new statistical model for natural sounds that captures structure across a wide range of time-scales, and to provide efficient learning and inference algorithms. We demonstrate the success of this approach on a missing data task.


Structured representations in the visual cortex
P. Berkes, R. Turner, M. Sahani

Many computational models have offered functional accounts of the organization of the sensory cortex. However, most have lacked the structure needed to extract the high-order causes of the sensory input. Here we present a generative model of visual input based on the duality between the identity of image features and their attributes. The presence of a feature is encoded by a binary identity variable, while its appearance is modeled by a multidimensional manifold, parametrized by a set of attribute variables. When applied to natural image sequences, the model finds attribute manifolds spanned by localized Gabor wavelets with similar positions, orientations, and frequencies, but different phases. Thus the inferred activity of attribute variables after learning resembles that of simple cells in the primary visual cortex. Identity variables indicate the presence of a feature irrespective of its position on the underlying manifold, making them phase-insensitive, like complex cells. The dimensionality of the learnt manifolds and the relationships between the wavelets correspond closely to anatomical and functional observations regarding simple and complex cells. Thus, this generative model makes explicit an interpretation of complex and simple cells as elements in the segmentation of a visual scene into independent features, with a parametrization of their episodic appearance. It also suggest a possible role for them in a hierarchical system that extracts progressively higher-level entities, starting from simpler, low-level features.


On sparsity and overcompleteness in image models
P. Berkes, R. Turner, M. Sahani

Computational models of visual cortex, and in particular those based on sparse coding, have enjoyed much recent attention. Despite this currency, the question of how sparse or how over-complete a sparse representation should be, has gone without principled answer. Here, we use Bayesian model-selection methods to address these questions for a sparse-coding model based on a Student-t prior. Having validated our methods on toy data, we find that natural images are indeed best modelled by extremely sparse distributions; although for the Student-t prior, the associated optimal basis size is only modestly overcomplete.