Emotional Speaker Verification Using Novel Modified Capsule Neural Network

Nassif, Ali Bou; Shahin, Ismail; Nemmour, Nawel; Hindawi, Noor; Elnagar, Ashraf

Emotional Speaker Verification Using Novel Modified Capsule Neural Network

Ali Bou Nassif (), Ismail Shahin, Nawel Nemmour, Noor Hindawi and Ashraf Elnagar
Additional contact information
Ali Bou Nassif: Computer Engineering Department, University of Sharjah, Sharjah 27272, United Arab Emirates
Ismail Shahin: Electrical Engineering Department, University of Sharjah, Sharjah 27272, United Arab Emirates
Nawel Nemmour: Computer Engineering Department, University of Sharjah, Sharjah 27272, United Arab Emirates
Noor Hindawi: Electrical Engineering Department, University of Sharjah, Sharjah 27272, United Arab Emirates
Ashraf Elnagar: Computer Science Department, University of Sharjah, Sharjah 27272, United Arab Emirates

Mathematics, 2023, vol. 11, issue 2, 1-21

Abstract: Capsule Neural Network (CapsNet) models are regarded as efficient substitutes for convolutional neural networks (CNN) due to their powerful hierarchical representation capability. Nevertheless, CNN endure their inability of recording spatial information in spectrograms. The main constraint of CapsNet is related to the compression method which can be implemented in CNN models but cannot be directly employed in CapsNet. As a result, we propose a novel architecture based on dual-channel long short-term memory compressed CapsNet (DC-LSTM–COMP CapsNet) for speaker verification in emotional as well as stressful talking environments. The proposed approach is perceived as a modified Capsule network that attempts to overcome the limitations that exist within the original CapsNet, as well as in CNN while enhancing the verification performance. The proposed architecture is assessed on four distinct databases. The experimental analysis reveals that the average speaker verification performance is improved in comparison with CNN, the original CapsNet, as well as the conventional classifiers. The proposed algorithm notably achieves the best verification accuracy across the four speech databases. For example, using the Emirati dataset, the average percentage equal error rates (EERs) obtained is 10.50%, based on the proposed architecture which outperforms other deep and classical models.

Keywords: capsule neural networks; deep neural network; speaker verification (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/2/459/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/2/459/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:2:p:459-:d:1036499

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().