UCL Logo

Sound texture perception via filter statistics


Josh McDermott

The Center for Neural Science, NYU, USA


Many natural sounds, such as those produced by rainstorms, fires, galloping horses, and swarms of insects, result from large numbers of rapidly occurring acoustic events. These sounds are auditory textures, analogous to the image textures whose representation in the visual system has been studied for decades. A defining characteristic of sound textures is temporal homogeneity, suggesting that their properties might be captured by statistics – specifically, time-averages of local sound characteristics. We hypothesized that the auditory system might encode and recognize sound textures using temporal expectations of the simple acoustic measurements featured in early auditory representations. We simulated such representations via filters tuned in acoustic and modulation frequency, and tested the perceptual importance of their statistics with a synthesis algorithm that imposed the statistics of a real sound on a sample of noise.

Despite the absence of mechanisms explicitly tuned to any particular real-world acoustic structure, imposing the same set of rudimentary filter statistics (mean, variance, skew, and kurtosis of the output of individual filters, along with pair-wise filter correlations) was sufficient to produce many different synthetic textures that sounded natural, and that listeners could reliably recognize. The results suggest that the auditory system could use fairly simple statistics to recognize many natural sound textures, and illustrate how synthesis can be used as a tool to probe auditory representation.