|
Primitive probabilistic auditory scene analysis
|
|
How can the auditory system make sense of the maelstrom of voices and sounds at the proverbial cocktail party? One set of clues to the answer comes from studies of auditory perception and scene analysis in more constrained settings. In particular, Gestalt psychology proposes a set of `laws' which qualitatively describe how auditory features are bound to auditory objects. Here we derive a simple computational model for auditory scene analysis. Where previous such efforts have concentrated on tuning parameters in largely deterministic models to fit human behaviour, we instead assume that these percpetual laws reflect statistical regularities in natural sounds and learn the parameters of a probabilistic model from acoustic recordings.
The statistical model comprises a set of narrow-band Gaussian carriers that are modulated by a set of slowly varying positive envelopes. The parameters of this model - the power, centre-frequencies and bandwidths of the carriers, the time-scale, and depth of the modulation and the patterns of comodulation - were learned from training sounds using approximate maximum-likelihood methods. We show that the model is able to capture many of the important statistics of auditory textures, and can thus be used to synthesise realistic versions of running water, wind, rain and fire. The same generative model was also used to capture many of the basic principles by which listeners appear to understand simple stimuli.
We show that the inferred values of the latent carrier and envelope processes correspond to perceptual principles of grouping by proximity, good continuation and common-fate as well as the continuity illusion, comodulation masking release, and the old plus new heuristic (Bregman 1990).