2. Understanding pitch perception as a multi-scale hierarchical generative process

Emili Balaguer1 emili.balaguer-ballester@plymouth.ac.uk Clark Nick2 nick@ihr.mrc.ac.uk Coath Martin1 martin.coath@plymouth.ac.uk Denham Sue1 s.denham@plymouth.ac.uk Katrin Krumbholz2 nick@ihr.mrc.ac.uk

1Centre for Theoretical and Computational Neuroscience, University of Plymouth, Plymouth, UK
2MRC Institute of Hearing Research,University of Nottingham,Nottingham,UK

Pitch is a salient unitary percept derived from the periodicities in a sound signal. Modelling the neural processing of the pitch is essential for understanding the perceptual phenomenology in music and the prosody in speech. However, in order to study the temporal dynamics of pitch, we need to invoke a wide range of time scales over which the perceptual information is integrated; and none of the existing approaches could account for the balance between temporal integration and temporal resolution windows in pitch perception.

Another major challenge for modelling is to relate the perception of pitch to neurophysiological data. Functional brain-imaging studies strongly suggest that there is some form of hierarchical processing in the auditory system, starting in subcortical structures. Moreover, there is also an increasing dispersion of responses, which are longer in cortex; and no attempt has yet been made to explain these latencies. The goal of this work is to develop the most compact neurocomputational formulation possible consistent with these evidences.

In this study, we introduce the novel idea that top-down activity within a hierarchical processing architecture is critical for understanding the temporal dynamics of pitch in a unified model. We present a simplified model of neural ensembles responses, which explains the stimulus-dependent time scales of temporal resolution and integration in pitch perception. We demonstrate that this model is an extension of autocorrelation models of pitch. We also show that the model is similar to a hierarchical generative process in which higher cortical levels predict the response in lower levels and modulate them via feedback connections. The model also explains the latency of the pitch onset response in cortex; and is consistent with other recent neurophysiological data.

In addition, we specifically conducted a psychoacoustic experiment to assess the temporal resolution of the auditory system. The experiment was conducted independently to the model development and subsequently used to successfully test the model predictions.

The simulation results show that this model provides for the first time a unified account of perceptual results in a range of challenging studies. Although highly idealized, the model significantly advances the identification of basic elements in the processing of the pitch over time, and provides a novel account of the role of feedback connections in the auditory system