Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis

Younis, Samir A.; Sobhy, Dalia; Tawfik, Noha S.

Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis

Samir A. Younis, Dalia Sobhy () and Noha S. Tawfik
Additional contact information
Samir A. Younis: Computer Engineering Department, Arab Academy of Science and Technology and Maritime Transport, Alexandria 1029, Egypt
Dalia Sobhy: Computer Engineering Department, Arab Academy of Science and Technology and Maritime Transport, Alexandria 1029, Egypt
Noha S. Tawfik: Computer Engineering Department, Arab Academy of Science and Technology and Maritime Transport, Alexandria 1029, Egypt

Future Internet, 2024, vol. 16, issue 7, 1-17

Abstract: Crying is a newborn’s main way of communicating. Despite their apparent similarity, newborn cries are physically generated and have distinct characteristics. Experienced medical professionals, nurses, and parents are able to recognize these variations based on their prior interactions. Nonetheless, interpreting a baby’s cries can be challenging for carers, first-time parents, and inexperienced paediatricians. This paper uses advanced deep learning techniques to propose a novel approach for baby cry classification . This study aims to accurately classify different cry types associated with everyday infant needs, including hunger, discomfort, pain, tiredness, and the need for burping. The proposed model achieves an accuracy of 98.33%, surpassing the performance of existing studies in the field. IoT-enabled sensors are utilized to capture cry signals in real time, ensuring continuous and reliable monitoring of the infant’s acoustic environment. This integration of IoT technology with deep learning enhances the system’s responsiveness and accuracy. Our study highlights the significance of accurate cry classification in understanding and meeting the needs of infants and its potential impact on improving infant care practices. The methodology, including the dataset, preprocessing techniques, and architecture of the deep learning model, is described. The results demonstrate the performance of the proposed model, and the discussion analyzes the factors contributing to its high accuracy.

Keywords: audio processing; cry sound analysis; deep learning; spectrogram; transformer models; convolutional neural networks (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/16/7/242/pdf (application/pdf)
https://www.mdpi.com/1999-5903/16/7/242/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:16:y:2024:i:7:p:242-:d:1430503

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().