
Primitive probabilistic auditory scene analysis


How can the auditory system make sense of the maelstrom of voices and sounds at the proverbial cocktail party? One set of clues to the answer comes from studies of auditory perception and scene analysis in more constrained settings. In particular, Gestalt psychology proposes a set of `laws' which qualitatively describe how auditory features are bound to auditory objects. Here we derive a simple computational model for auditory scene analysis. Where previous such efforts have concentrated on tuning parameters in largely deterministic models to fit human behaviour, we instead assume that these percpetual laws reflect statistical regularities in natural sounds and learn the parameters of a probabilistic model from acoustic recordings.
The statistical model comprises a set of narrowband Gaussian carriers that are modulated by a set of slowly varying positive envelopes. The parameters of this model  the power, centrefrequencies and bandwidths of the carriers, the timescale, and depth of the modulation and the patterns of comodulation  were learned from training sounds using approximate maximumlikelihood methods. We show that the model is able to capture many of the important statistics of auditory textures, and can thus be used to synthesise realistic versions of running water, wind, rain and fire. The same generative model was also used to capture many of the basic principles by which listeners appear to understand simple stimuli.
We show that the inferred values of the latent carrier and envelope processes correspond to perceptual principles of grouping by proximity, good continuation and commonfate as well as the continuity illusion, comodulation masking release, and the old plus new heuristic (Bregman 1990).