Review and Comparative Analysis of Databases for Speech Emotion Recognition
Salvatore Serrano (),
Omar Serghini,
Giulia Esposito,
Silvia Carbone,
Carmela Mento,
Alessandro Floris,
Simone Porcu and
Luigi Atzori
Additional contact information
Salvatore Serrano: Laboratory of Digital Signal Processing, Department of Engineering, University of Messina, 98122 Messina, Italy
Omar Serghini: Laboratory of Digital Signal Processing, Department of Engineering, University of Messina, 98122 Messina, Italy
Giulia Esposito: Laboratory of Digital Signal Processing, Department of Engineering, University of Messina, 98122 Messina, Italy
Silvia Carbone: Dipartimento di Scienze Politiche e Giuridiche, University of Messina, 98122 Messina, Italy
Carmela Mento: Department of Biomedical and Dental Sciences and Morphofunctional Imaging, University of Messina, Via Consolare Valeria, 1, 98125 Messina, Italy
Alessandro Floris: Department of Electrical and Electronic Engineering, University of Cagliari, Via Marengo, 2, 09123 Cagliari, Italy
Simone Porcu: Department of Electrical and Electronic Engineering, University of Cagliari, Via Marengo, 2, 09123 Cagliari, Italy
Luigi Atzori: Department of Electrical and Electronic Engineering, University of Cagliari, Via Marengo, 2, 09123 Cagliari, Italy
Data, 2025, vol. 10, issue 10, 1-58
Abstract:
Speech emotion recognition (SER) has become increasingly important in areas such as healthcare, customer service, robotics, and human–computer interaction. The progress of this field depends not only on advances in algorithms but also on the databases that provide the training material for SER systems. These resources set the boundaries for how well models can generalize across speakers, contexts, and cultures. In this paper, we present a narrative review and comparative analysis of emotional speech corpora released up to mid-2025, bringing together both psychological and technical perspectives. Rather than following a systematic review protocol, our approach focuses on providing a critical synthesis of more than fifty corpora covering acted, elicited, and natural speech. We examine how these databases were collected, how emotions were annotated, their demographic diversity, and their ecological validity, while also acknowledging the limits of available documentation. Beyond description, we identify recurring strengths and weaknesses, highlight emerging gaps, and discuss recent usage patterns to offer researchers both a practical guide for dataset selection and a critical perspective on how corpus design continues to shape the development of robust and generalizable SER systems.
Keywords: corpus analysis; emotion modeling; emotional speech databases; speech emotion recognition (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2306-5729/10/10/164/pdf (application/pdf)
https://www.mdpi.com/2306-5729/10/10/164/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:10:y:2025:i:10:p:164-:d:1771204
Access Statistics for this article
Data is currently edited by Ms. Becky Zhang
More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().