EmoBERTa–CNN: Hybrid Deep Learning Approach Capturing Global Semantics and Local Features for Enhanced Emotion Recognition in Conversational Settings

Zhang, Mingfeng; Yu, Aihe; Sheng, Xuanyu; Park, Jisun; Rhee, Jongtae; Cho, Kyungeun

EmoBERTa–CNN: Hybrid Deep Learning Approach Capturing Global Semantics and Local Features for Enhanced Emotion Recognition in Conversational Settings

Mingfeng Zhang, Aihe Yu, Xuanyu Sheng, Jisun Park, Jongtae Rhee and Kyungeun Cho ()
Additional contact information
Mingfeng Zhang: Department of Computer Science and Artificial Intelligence, Dongguk University-Seoul, 30 Pildongro 1-gil, Jung-gu, Seoul 04620, Republic of Korea
Aihe Yu: Department of Autonomous Things Intelligence, Dongguk University-Seoul, 30 Pildongro 1-gil, Jung-gu, Seoul 04620, Republic of Korea
Xuanyu Sheng: Department of Computer Science and Artificial Intelligence, Dongguk University-Seoul, 30 Pildongro 1-gil, Jung-gu, Seoul 04620, Republic of Korea
Jisun Park: NUI/NUX Platform Research Center, Dongguk University-Seoul, 30 Pildongro-1-gil, Jung-gu, Seoul 04620, Republic of Korea
Jongtae Rhee: Industrial Artificial Intelligence Researcher Center, Dongguk University-Seoul, 30 Pildongro 1-gil, Jung-gu, Seoul 04620, Republic of Korea
Kyungeun Cho: Department of Computer Science and Artificial Intelligence, College of Advanced Convergence Engineering, Dongguk University-Seoul, 30 Pildongro 1-gil, Jung-gu, Seoul 04620, Republic of Korea

Mathematics, 2025, vol. 13, issue 15, 1-20

Abstract: Emotion recognition in conversations is a key task in natural language processing that enhances the quality of human–computer interactions. Although existing deep learning and Transformer-based pretrained language models have shown remarkably enhanced performances, both approaches have inherent limitations. Deep learning models often fail to capture the global semantic context, whereas Transformer-based pretrained language models can overlook subtle, local emotional cues. To overcome these challenges, we developed EmoBERTa–CNN, a hybrid framework that combines EmoBERTa’s ability to capture global semantics with the capability of convolutional neural networks (CNNs) to extract local emotional features. Experiments on the SemEval-2019 Task 3 and Multimodal EmotionLines Dataset (MELD) demonstrated that the proposed EmoBERTa–CNN model achieved F1-scores of 96.0% and 79.45%, respectively, significantly outperforming existing methods and confirming its effectiveness for emotion recognition in conversations.

Keywords: pre-trained language model; deep learning; emotion recognition (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/15/2438/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/15/2438/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:15:p:2438-:d:1712388

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().