A self-calibrating algorithm for speaker tracking based on audio-visual statstical models
Matthew J. Beal, Nebojsa Jojic and Hagai Attias
Abstract: We present a self-calibrating algorithm for audio-visual tracking using two microphones and a camera. The algorithm uses a parametrized statistical model which combines simple models of video and audio. Using unobserved variables, the model describes the process that generates the observed data. Hence, it is able to capture and exploit the statistical structure of the audio and video data, as well as their mutual dependencies. The model parameters are estimated by the EM algorithm; object templates are learned and automatic calibration is performed as part of this procedure. Tracking is done by Bayesian inference of the object location using the model. Successful performance is demonstrated on real multimedia clips.
Click on the links below to download examples. Or download the entire set here (40M) for offline viewing (unzip all files into a temporary directory and open icassp02.html).
As the videos are quite large, we recommend you download each before playing them. For example in Internet Explorer right-click the link and select "Save Target As...".