Language Accent Detection with CNN Using Sparse Data from a Crowd-Sourced Speech Archive

Mikhailava, Veranika; Lesnichaia, Mariia; Bogach, Natalia; Lezhenin, Iurii; Blake, John; Pyshkin, Evgeny

Language Accent Detection with CNN Using Sparse Data from a Crowd-Sourced Speech Archive

Veranika Mikhailava, Mariia Lesnichaia, Natalia Bogach (), Iurii Lezhenin, John Blake and Evgeny Pyshkin ()
Additional contact information
Veranika Mikhailava: School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan
Mariia Lesnichaia: Institute of Computer Science and Technology, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
Natalia Bogach: Institute of Computer Science and Technology, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
Iurii Lezhenin: Institute of Computer Science and Technology, Peter the Great St. Petersburg Polytechnic University, 195251 St. Petersburg, Russia
John Blake: School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan
Evgeny Pyshkin: School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan

Mathematics, 2022, vol. 10, issue 16, 1-30

Abstract: The problem of accent recognition has received a lot of attention with the development of Automatic Speech Recognition (ASR) systems. The crux of the problem is that conventional acoustic language models adapted to fit standard language corpora are unable to satisfy the recognition requirements for accented speech. In this research, we contribute to the accent recognition task for a group of up to nine European accents in English and try to provide some evidence in favor of specific hyperparameter choices for neural network models together with the search for the best input speech signal parameters to ameliorate the baseline accent recognition accuracy. Specifically, we used a CNN-based model trained on the audio features extracted from the Speech Accent Archive dataset, which is a crowd-sourced collection of accented speech recordings. We show that harnessing time–frequency and energy features (such as spectrogram, chromogram, spectral centroid, spectral rolloff, and fundamental frequency) to the Mel-frequency cepstral coefficients (MFCC) may increase the accuracy of the accent classification compared to the conventional feature sets of MFCC and/or raw spectrograms. Our experiments demonstrate that the most impact is brought about by amplitude mel-spectrograms on a linear scale fed into the model. Amplitude mel-spectrograms on a linear scale, which are the correlates of the audio signal energy, allow to produce state-of-the-art classification results and brings the recognition accuracy for English with Germanic, Romance and Slavic accents ranged from 0.964 to 0.987; thus, outperforming existing models of classifying accents which use the Speech Accent Archive. We also investigated how the speech rhythm affects the recognition accuracy. Based on our preliminary experiments, we used the audio recordings in their original form (i.e., with all the pauses preserved) for other accent classification experiments.

Keywords: NLP; automatic accent identification; convolutional neural networks (CNN); Mel-frequency cepstral coefficients (MFCC); amplitude mel-spectrogram; crowd-sourced data collection (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/16/2913/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/16/2913/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:16:p:2913-:d:887274

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().