EconPapers    
Economics at your fingertips  
 

Nonperiodic Pathologic Voice Signals Classification Using Mel-Spectrogram and VGGish

Joana Filipa Teixeira Fernandes (), João Viana Pinto, Carla Pinto Moura (), Helena Vilarinho, Felipe Teixeira (), Diamantino Freitas () and João Paulo Teixeira ()
Additional contact information
Joana Filipa Teixeira Fernandes: Research Centre in Digitalization and Intelligent Robotics (CeDRI), Laboratório para a Sustentabilidade e Tecnologia em Regiões de Montanha (SusTEC), - Instituto Politécnico de Bragança (IPB)
João Viana Pinto: University Hospital Centre of São João, Otorhinolaryngology Department
Carla Pinto Moura: University of Porto, Genetics, Faculty of Medicine, Department of Pathology
Helena Vilarinho: University Hospital Centre of São JoãoPorto, Department of Otorhinolaryngology
Felipe Teixeira: Research Centre in Digitalization and Intelligent Robotics (CeDRI), Laboratório para a Sustentabilidade e Tecnologia em Regiões de Montanha (SusTEC), - Instituto Politécnico de Bragança (IPB)
Diamantino Freitas: Faculty of Engineering of University of Porto (FEUP)
João Paulo Teixeira: Research Centre in Digitalization and Intelligent Robotics (CeDRI), Laboratório para a Sustentabilidade e Tecnologia em Regiões de Montanha (SusTEC), - Instituto Politécnico de Bragança (IPB)

A chapter in Health Technologies and Demographic Challenges, 2025, pp 3-13 from Springer

Abstract: Abstract In this work and the literature, voice signals can be classified as periodic (type 1) or either some periodicity (type 2) and chaos (type 3). This work aims to classify signs into types 1, 2 or 3 to be subsequently applied in a classification system for pathological/control signs. The original dataset is composed of 466 type 1 individuals, 900 type 2 individuals, and 84 type 3 individuals classified by an otolaryngologist. 15% of the data was used for testing and the remaining 85% was used for training and validation. A data augmentation technique was applied to balance the data in training set. Therefore, for the test set, 3380 sounds were used, 1020 type 1, 1280 type 2 and 1080 type 3. Of these, 80% were used for training and 20% for validation. The Mel spectrograms of the signals were used in the input of a VGGish to retrain the model in classifying the 3 types of signals. Regarding test accuracy, this network obtained 71.2%.

Keywords: Disordered Signals Types; Mel Spectrogram; Convolutional Neural Networks (CNN); VGGish (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:prbchp:978-3-031-94901-2_1

Ordering information: This item can be ordered from
http://www.springer.com/9783031949012

DOI: 10.1007/978-3-031-94901-2_1

Access Statistics for this chapter

More chapters in Springer Proceedings in Business and Economics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2026-02-18
Handle: RePEc:spr:prbchp:978-3-031-94901-2_1