Gatsby Unit | Research

GATSBY COMPUTATIONAL NEUROSCIENCE UNIT

Structured Prediction and Inference for Scene Analysis

Matthew Blaschko

Department of Engineering Science

University of Oxford

Location: Seminar Room B10 (Gatsby Basement), Alexandra House, 17 Queen Square,
London WC1N 3AR

Wednesday 22 June 2011, 4 - 5pm

Structured Prediction and Inference for Scene Analysis

Learning methods have been widely applied in computer vision to solve tasks such as image classification, regression, dimensionality reduction, and clustering. This is highly simplified from the original goal of enabling machines to process visual data in unconstrained environments with similar sophistication to humans, and is largely the result of the application of black box learning algorithms that do not have specific knowledge of the problem structure. Much research in computer vision over the past two decades has been devoted to fitting part of a computer vision problem into one of these existing paradigms rather than directly predicting the desired output. Structured output learning promises to provide a more domain-aware learning paradigm that can help overcome these shortcomings. The problem of predicting structured data is central to vision problems, in which the outputs to be predicted are not simply binary labels or scalar values, as in classification and regression, respectively, but encode the rich structure of scene understanding.

In this talk, I will discuss the application of the structured output learning paradigm to object detection, a core component of scene understanding. In order to feasibly apply this strategy, we must solve a number of challenges, in particular in relation to efficient inference strategies for object detection. Object detection is in general a highly non-convex problem with many local optima. I show how the application of a branch-and-bound strategy can be developed for efficient and optimal inference both at test time, as well as in a cutting plane optimization loop for structured output support vector machines. I further develop an extension of the structured output SVM objective to ranking with weak supervision. This enables the structured output learning framework to incorporate highly imbalanced data for which the majority of training samples have no correct structured output prediction, and to training data with heterogeneous levels of supervision, e.g. a mixture of binary labels and correct object detections. These examples indicate that structured output learning is an effective strategy for efficient and accurate object detection, as well as a flexible framework that is readily extensible to many useful output spaces and heterogeneous sources of training data that may be assembled at reduced cost to human labelers.

Joint work with Christoph Lampert, Thomas Hofmann, Andrea Vedaldi, and Andrew Zisserman.