How the auditory system maps temporal acoustic cues into spatial patterns

Shihab A. Shamma

University of Maryland

The perception of sound involves a complex array of attributes, ranging from the sensation of timber and pitch, to the localization and fusion of sound sources. Computational strategies proposed to describe these phenomena have emphasized temporal features in the representation of sound in the auditory system. This is in contrast to visual processing where spatial features, such as edges, their orientation and direction of motion selectivity, play a critical role in defining an image. These divergent views of auditory and visual processing have led to the conclusion that the underlying neural networks must be quite different. However, if one adopts the tonotopic axis of the cochlea as the spatial axis of the auditory system, one finds a multitude of intricate spatio-temporal cues in the neural response patterns distributed along the tonotopic axis of the peripheral and central auditory system. These findings suggest a unified computational framework, and hence shared neural network architectures, for central auditory and visual processing. Specifically, we shall demonstrate here how four fundamental concepts in visual processing play an analogous role in auditory processing and perception. These are: (1) Lateral inhibition for extraction of the acoustic spectrum in the cochlear nucleus; (2) Edge orientation and direction of motion sensitivity for spectral analysis underlying timbre perception; (3) Coincidence detection (as in figure/surround segregation and perception of bilateral symmetry) for sound source-delineation based on common onsets and harmonic relationships, and for complex pitch perception in general; (4) stereopsis for binaural processing and localization.