MULTI-CLASS AUTOMATED SPEECH LANGUAGE RECOGNITION USING NATURAL LANGUAGE PROCESSING WITH OPTIMAL DEEP LEARNING MODEL

Al-Anazi, Reema G.; Alqahtani, Hamed; Alzaidi, Muhammad Swaileh A.; Alanazi, Meshari H.; Sultan, Hanan Al; Alrowaily, Amal F.; Aljabri, Jawhara; Alqudah, Assal

MULTI-CLASS AUTOMATED SPEECH LANGUAGE RECOGNITION USING NATURAL LANGUAGE PROCESSING WITH OPTIMAL DEEP LEARNING MODEL

Reema G. Al-Anazi, Hamed Alqahtani, Muhammad Swaileh A. Alzaidi, Meshari H. Alanazi, Hanan Al Sultan, Amal F. Alrowaily, Jawhara Aljabri and Assal Alqudah
Additional contact information
Reema G. Al-Anazi: Department of Arabic Language and Literature, College of Humanities and Social Sciences, Princess Nourah bint Abdulrahman University, P. O. Box 84428, Riyadh 11671, Saudi Arabia
Hamed Alqahtani: ï¿½ï¿½Department of Information Systems, College of Computer Science, Center of Artificial Intelligence, Unit of Cybersecurity, King Khalid University, Abha, Saudi Arabia
Muhammad Swaileh A. Alzaidi: ï¿½ï¿½Department of English Language, College of Language Sciences, King Saud University, P. O. Box 145111, Riyadh, Saudi Arabia
Meshari H. Alanazi: ï¿½Department of Computer Science, College of Sciences, Northern Border University, Arar, Saudi Arabia
Hanan Al Sultan: ï¿½Department of English, College of Arts, King Faisal University, Ahsaa, Saudi Arabia
Amal F. Alrowaily: ï¿½ï¿½Department of Family Medicine, King Abdulaziz Medical City, Ministry of National Guard-Health Affairs, Riyadh, Saudi Arabia
Jawhara Aljabri: *Department of Computer Science, University College in Umluj, University of Tabuk, Tabuk, Saudi Arabia
Assal Alqudah: ï¿½ï¿½â€ Department of Computer Science, AlZaytoonah University of Jordan, Amman, Jordan

FRACTALS (fractals), 2025, vol. 33, issue 02, 1-15

Abstract: With technological development, humanâ€“computer interaction (HCI) has improved, and spoken communication among machines and humans is one solution to enhance and expedite this process. Researchers have recently explored several systems to improve speech and speaker recognition performance in recent decades. A crucial threat in HCI is developing models that can effectually listen and respond like humans. It resulted in the development of the automated speech emotion recognition (SER) method, which can recognize various emotional classes by electing and extracting effectual features from speech signals. The fundamental problem of automated speech detection is the considerable variation in speech signals because of distinct speakers, language differences, speech differences, contents and acoustic conditions, voice modulation differences based on age and gender. With enhancements in deep learning (DL) and the affordability of computational resources, specifically graphical processing units (GPUs), research underwent a paradigm shift. Therefore, this study develops a multi-class automated speech language recognition using natural language processing with optimal deep learning (MASLR-NLPODL) technique. The MASLR-NLPODL technique intends to accomplish the efficient identification of different spoken languages. In the MASLR-NLPODL technique, the initial preprocessing technique involves windowing, frame blocking, and pre-emphasis block. Next, an adaptive time-frequency feature extractor approach utilizing the discrete fractional Fourier transform (DFrFT) was applied, which can be attained by extending the discrete Fourier transform (DFT) with eigenvectors. An improved Harris hawks optimization (IHHO) technique can be employed to select effectual features. Moreover, the classification of spoken languages can be performed by the gated recurrent unit (GRU) model. Finally, the salp swarm algorithm (SSA)-based hyperparameter selection process is involved in enhancing the performance of the GRU model. The design of the IHHO-based feature selection and SSA-based hyperparameter tuning process demonstrates the novelty of the work. The performance evaluation of the MASLR-NLPODL technique takes place under the VoxForge Dataset. The experimental validation of the MASLR-NLPODL technique exhibited a superior accuracy outcome of 96.40% over existing techniques.

Keywords: Speech Language Recognition; Humanâ€“Computer Interaction; Hyperparameter Selection; Harris Hawks Optimization; NLP; Deep Learning (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.worldscientific.com/doi/abs/10.1142/S0218348X25400213
Access to full text is restricted to subscribers

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wsi:fracta:v:33:y:2025:i:02:n:s0218348x25400213

Ordering information: This journal article can be ordered from

DOI: 10.1142/S0218348X25400213

Access Statistics for this article

FRACTALS (fractals) is currently edited by Tara Taylor

More articles in FRACTALS (fractals) from World Scientific Publishing Co. Pte. Ltd.
Bibliographic data for series maintained by Tai Tone Lim ().