Machine Vision Research

A long standing problem in computer vision is to develop representations which are invariant to transformations of the input. For example, consider a sequence of images showing a tree swaying in the breeze on a bright summer's day whilst a cloud moves over the sun altering the intensity and spectrum of the illuminating light.

These transformations cause the image pixels to flicker and change in a complicated way, which makes it difficult for a computer to recognise the objects present in the scene. Nevertheless, upon viewing the image sequence, the percept of the tree remains stable implying that an invariant neural representation has been formed. How does the brain solve this problem, and can this solution be harnessed in a computer vision system?

My work addresses this question through understanding the computational principles that enable these representations to be learned from sensory input, and by placing these principles into a form which enables them to be harnessed by computer vision.


Slowness is believed to be an important signature because meaningful variables, such as the identity of the tree, are more persistent than the raw pixel values. By adapting representations of images to extract features which are slow in a statistical sense, we can discover invariant representations.

The Slow Feature Analysis algorithm is based upon the slowness principle. Slow Feature Analysis takes the pixels from an input video and, by a linear transformation, extracts sets of variables which vary as slowly as possible, whilst begin decorrelated from one another. This simple algorithm can extract meaningful high level variables from video, like objects. Our work has shown that a simple yet powerful probabilistic model, which embodies the slowness principle, is equivalent to the popular slow feature analysis algorithm and this perspective leads to several generalisations of the method. For example, to non-linear extraction methods, hierachical architectures, and missing and noisy data settings.


Real world datasets contain rich and complex sets of transformations, like those caused by the swaying tree. Historically, one of the barriers to progress in capturing these transformations in computer vision has been the fact that the statistical signatures arising from the content of the image (e.g.~objects) are entangled with those arising from the transformations (e.g.~lighting conditions).

In order to resolve this issue, Prof. Eero Simoncelli and I have collected a large data-base of controlled naturalistic images with a variety of surface properties in which the position, direction, intensity, and spectral properties of the light sources were systematically varied whilst the image content remained fixed. By separating variations due to changing content and variations due to changing transformations, we hope to simplify the learning problem.


Here's a simple transformation where the camera shakes:

Here's a complex transformation where one of the light sources moves:

The goal is to build models for both the content of the scene and the transformations which the content can undergo. Using this method we can impose transformations on a novel object and remove transformations to produce an invariant representation for recognition.


Related papers

Related talks



Return to main page