Voice Pathology Detection Using a Two-Level Classifier Based on Combined CNN–RNN Architecture
Amel Ksibi (),
Nada Ali Hakami,
Nazik Alturki (),
Mashael M. Asiri,
Mohammed Zakariah and
Manel Ayadi
Additional contact information
Amel Ksibi: Department of Information Systems, College of Computer and Information Science, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
Nada Ali Hakami: Computer Science Department, College of Computer Science and Information Technology, Jazan University, Jazan 45142, Saudi Arabia
Nazik Alturki: Department of Information Systems, College of Computer and Information Science, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
Mashael M. Asiri: Department of Computer Science, College of Science & Art at Mahayil, King Khalid University, Abha 62529, Saudi Arabia
Mohammed Zakariah: College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
Manel Ayadi: Department of Information Systems, College of Computer and Information Science, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
Sustainability, 2023, vol. 15, issue 4, 1-18
Abstract:
The construction of an automatic voice pathology detection system employing machine learning algorithms to study voice abnormalities is crucial for the early detection of voice pathologies and identifying the specific type of pathology from which patients suffer. This paper’s primary objective is to construct a deep learning model for accurate speech pathology identification. Manual audio feature extraction was employed as a foundation for the categorization process. Incorporating an additional piece of information, i.e., voice gender, via a two-level classifier model was the most critical aspect of this work. The first level determines whether the audio input is a male or female voice, and the second level determines whether the agent is pathological or healthy. Similar to the bulk of earlier efforts, the current study analyzed the audio signal by focusing solely on a single vowel, such as /a/, and ignoring phrases and other vowels. The analysis was performed on the Saarbruecken Voice Database,. The two-level cascaded model attained an accuracy and F1 score of 88.84% and 87.39%, respectively, which was superior to earlier attempts on the same dataset and provides a steppingstone towards a more precise early diagnosis of voice complications.
Keywords: recurrent neural networks (RNNs); deep learning; audio feature extraction; Mel-frequency cepstral coefficients (search for similar items in EconPapers)
JEL-codes: O13 Q Q0 Q2 Q3 Q5 Q56 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2071-1050/15/4/3204/pdf (application/pdf)
https://www.mdpi.com/2071-1050/15/4/3204/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jsusta:v:15:y:2023:i:4:p:3204-:d:1063504
Access Statistics for this article
Sustainability is currently edited by Ms. Alexandra Wu
More articles in Sustainability from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().