Modeling Natural Sounds with Gaussian Modulation Cascade Processes

R. E. Turner and M. Sahani

Published in: Advances in Models for Acoustic Processing Workshop at the Advances in Neural Information Processing Systems Conference

The computational principles underpinning auditory processing are not well understood. This fact stands in stark contrast to early visual processing for which computational theories, and especially those built on statistical models, have recently enjoyed great success. We believe one of the reasons for this disparity is the paucity of rich, learnable generative models for natural scenes with an explicit temporal dimension. To that end we introduce a new generative model for the dynamic Fourier components of sounds. This comprises a cascade of modulatory processes which evolve over a wide range of time-scales. We show the model is capable of capturing both the sparse marginal distribution and the prevelance of amplitude modulation in natural sounds, to which the auditory system appears to listen so attentively. Moreover, we demonstrate that it is relatively easy to learn and to do inference in the Gaussian Modulation Cascade Process, due to the structure of its non-linearity. We hope that this provides a first step toward furthering our understanding of auditory computations.

bibtex, pdf

Related publications: