|
Compensation for the effects of reverberation in human and machine listeners
|
|
Human listeners are remarkably robust to the effects of room reverberation, but it remains a significant challenge for automatic speech recognition (ASR); the error rate of an ASR system increases rapidly with increasing reverberation time. Two principles of auditory processing that might underlie the robustness of human listeners to reverberation are the ability to deal with missing/ unreliable acoustic features, and dynamic range adjustment based on efferent feedback. Computational models that implement these two approaches are presented, and compared with human performance. In the first, an auditory time-frequency representation of reverberated speech is processed in order to identify regions that are relatively uncorrupted by energy from room reflections. Missing data techniques are then used during speech decoding so that "reliable" and "unreliable" acoustic features are treated differently. In the second approach, an extension of the efferent model of Ferry and Meddis [JASA (2007), vol. 122, pp. 3519-3526] is presented in which the amount of efferent activity depends on the dynamic range of simulated auditory nerve firing patterns. It is shown that the efferent model replicates "perceptual compensation" for reverberation in a speech discrimination task. Approaches for combining the missing data and efferent processing models will be considered.