GATSBY COMPUTATIONAL NEUROSCIENCE UNIT
UCL Logo

Josh McDermott

Auditory and Perception Cognition Lab, University of Minnesota, USA

 

Friday 18 January 2008, 14.00

 

Seminar Room B10 (Basement)

Alexandra House, 17 Queen Square, London, WC1N 3AR

 

 

Sound Texture Perception Via Texture Synthesis

 

Many natural sounds, such as those produced by rainstorms, fires, insects at night, or birds in a forest, are the result of large numbers of acoustic events occurring rapidly and randomly. Sound textures such as these are often temporally homogeneous, and in many cases do not depend much on the precise arrangement of the component events, suggesting that they might be represented statistically. To test this idea and explore what statistics might characterize natural sound textures, we designed an algorithm to synthesize sound textures from statistics extracted from real textures. The algorithm is inspired by those used to synthesize visual textures, in which statistical constraints are sequentially imposed on a sample of noise. This process is iterated, and converges over time, with the noise sample altered so as to obey the chosen statistical constraints. If the statistics capture the perceptually important properties of the texture in question, the synthesized result ought to sound like a sample of the texture. We tested whether rudimentary statistics computed from the responses of a bank of bandpass filters could produce compelling synthetic textures. Simply matching the marginal statistics (variance, kurtosis) within individual filters was generally insufficient to yield good results, but imposing various joint envelope statistics (cross-band correlations, autocorrelations within each band, and cross-band correlations across time) greatly improved the results, frequently producing synthetic textures that sounded natural and recognizable. The results suggest that rather simple statistics may be used by the auditory system to represent and recognize sound textures. Synthesizing some classes of textures may necessitate higher order ?feature detectors?, but in many cases, textures with recognizable features (raindrops, crackles, insect/bird calls) emerge from the imposition of much simpler statistical constraints.