EconPapers    
Economics at your fingertips  
 

TOWARD ROBUST SPEECH EMOTION RECOGNITION AND CLASSIFICATION USING NATURAL LANGUAGE PROCESSING WITH DEEP LEARNING MODEL

Saad Alahmari, Najla I. Al-Shathry, Majdy M. Eltahir, Muhammad Swaileh A. Alzaidi, Ayman Ahmad Alghamdi and Ahmed Mahmud
Additional contact information
Saad Alahmari: Department of Computer Science, Applied College, Northern Border University, Arar, Saudi Arabia
Najla I. Al-Shathry: ��Department of Language Preparation, Arabic Language Teaching Institute, Princess Nourah Bint Abdulrahman University, P. O. Box 84428, Riyadh 11671, Saudi Arabia
Majdy M. Eltahir: ��Department of Information Systems, Applied College at Mahayil, King Khalid University, Abha, Saudi Arabia
Muhammad Swaileh A. Alzaidi: �Department of English Language, College of Language Sciences, King Saud University, P. O. Box 145111, Riyadh, Saudi Arabia
Ayman Ahmad Alghamdi: �Department of Arabic Teaching, Arabic Language Institute, Umm Al-qura University, Mecca, Saudi Arabia
Ahmed Mahmud: ��Research Center, Future University in Egypt, New Cairo 11835, Egypt

FRACTALS (fractals), 2025, vol. 33, issue 02, 1-15

Abstract: Speech Emotion Recognition (SER) plays a significant role in human–machine interaction applications. Over the last decade, many SER systems have been anticipated. However, the performance of the SER system remains a challenge owing to the noise, high system complexity and ineffective feature discrimination. SER is challenging and vital, and feature extraction is critical in SER performance. Deep Learning (DL)-based techniques emerge as proficient solutions for SER due to their competence in learning unlabeled data, superior capability of feature representation, capability to handle larger datasets and ability to handle complex features. Different DL techniques, like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), Deep Neural Network (DNN) and so on, are successfully presented for automated SER. The study proposes a Robust SER and Classification using the Natural Language Processing with DL (RSERC-NLPDL) model. The presented RSERC-NLPDL technique intends to identify the emotions in the speech signals. In the RSERC-NLPDL technique, pre-processing is initially performed to transform the input speech signal into a valid format. Besides, the RSERC-NLPDL technique extracts a set of features comprising Mel-Frequency Cepstral Coefficients (MFCCs), Zero-Crossing Rate (ZCR), Harmonic-to-Noise Rate (HNR) and Teager Energy Operator (TEO). Next, selecting features can be carried out using Fractal Seagull Optimization Algorithm (FSOA). The Temporal Convolutional Autoencoder (TCAE) model is applied to identify speech emotions, and its hyperparameters are selected using fractal Sand Cat Swarm Optimization (SCSO) algorithm. The simulation analysis of the RSERC-NLPDL method is tested using a speech database. The investigational analysis of the RSERC-NLPDL technique showed superior accuracy outcomes of 94.32% and 95.25% under EMODB and RAVDESS datasets over other models in distinct measures.

Keywords: Speech Emotion Recognition; Deep Learning; Fractal Seagull Optimization Algorithm; Feature Extraction (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0218348X25400225
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:fracta:v:33:y:2025:i:02:n:s0218348x25400225

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0218348X25400225

Access Statistics for this article

FRACTALS (fractals) is currently edited by Tara Taylor

More articles in FRACTALS (fractals) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().

 
Page updated 2025-04-19
Handle: RePEc:wsi:fracta:v:33:y:2025:i:02:n:s0218348x25400225