Neural modelling of the encoding of fast frequency modulation
Alejandro Tabas and
Katharina von Kriegstein
PLOS Computational Biology, 2021, vol. 17, issue 3, 1-30
Abstract:
Frequency modulation (FM) is a basic constituent of vocalisation in many animals as well as in humans. In human speech, short rising and falling FM-sweeps of around 50 ms duration, called formant transitions, characterise individual speech sounds. There are two representations of FM in the ascending auditory pathway: a spectral representation, holding the instantaneous frequency of the stimuli; and a sweep representation, consisting of neurons that respond selectively to FM direction. To-date computational models use feedforward mechanisms to explain FM encoding. However, from neuroanatomy we know that there are massive feedback projections in the auditory pathway. Here, we found that a classical FM-sweep perceptual effect, the sweep pitch shift, cannot be explained by standard feedforward processing models. We hypothesised that the sweep pitch shift is caused by a predictive feedback mechanism. To test this hypothesis, we developed a novel model of FM encoding incorporating a predictive interaction between the sweep and the spectral representation. The model was designed to encode sweeps of the duration, modulation rate, and modulation shape of formant transitions. It fully accounted for experimental data that we acquired in a perceptual experiment with human participants as well as previously published experimental results. We also designed a new class of stimuli for a second perceptual experiment to further validate the model. Combined, our results indicate that predictive interaction between the frequency encoding and direction encoding neural representations plays an important role in the neural processing of FM. In the brain, this mechanism is likely to occur at early stages of the processing hierarchy.Author summary: Humans’ ability to understand and produce speech is one of the most fascinating developments of evolution. It is critical for smooth daily routines at the individual to the societal level. The computational mechanisms that the human brain uses for excelling at speech recognition are far from understood. One of the fundamental building blocks of speech are so-called formant transitions that characterise different speech sounds. To date, formant transitions are assumed to be processed according to a representational framework. In this view, the brain processes auditory signals in a hierarchical constructive way, where the higher levels of the hierarchy, that represent the formant transition directions, are informed by the neural representations of individual frequencies at the lower levels, but not vice versa. Here, we show that the representational framework does not fully explain human behaviour. Instead we develop a novel computational model in which the neural representations of formant transitions influence lower-level representations. This mechanism effectively increased the speed and efficiency of the recognition of formant transitions. The model explained previously unaccounted phenomena in human perceptual behaviour. These neural principles can be extended to other auditory processing networks and sensory modalities, and can be incorporated to neurobiologically-inspired automatic speech recognition algorithms.
Date: 2021
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008787 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 08787&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1008787
DOI: 10.1371/journal.pcbi.1008787
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().