Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network

Ishimaru, Momoko; Okada, Yoshifumi; Uchiyama, Ryunosuke; Horiguchi, Ryo; Toyoshima, Itsuki

Classification of Depression and Its Severity Based on Multiple Audio Features Using a Graphical Convolutional Neural Network

Momoko Ishimaru, Yoshifumi Okada (), Ryunosuke Uchiyama, Ryo Horiguchi and Itsuki Toyoshima
Additional contact information
Momoko Ishimaru: Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan
Yoshifumi Okada: College of Information and Systems, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan
Ryunosuke Uchiyama: Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan
Ryo Horiguchi: Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan
Itsuki Toyoshima: Division of Information and Electronic Engineering, Muroran Institute of Technology, 27-1, Mizumoto-cho, Muroran 050-8585, Hokkaido, Japan

IJERPH, 2023, vol. 20, issue 2, 1-15

Abstract: Audio features are physical features that reflect single or complex coordinated movements in the vocal organs. Hence, in speech-based automatic depression classification, it is critical to consider the relationship among audio features. Here, we propose a deep learning-based classification model for discriminating depression and its severity using correlation among audio features. This model represents the correlation between audio features as graph structures and learns speech characteristics using a graph convolutional neural network. We conducted classification experiments in which the same subjects were allowed to be included in both the training and test data (Setting 1) and the subjects in the training and test data were completely separated (Setting 2). The results showed that the classification accuracy in Setting 1 significantly outperformed existing state-of-the-art methods, whereas that in Setting 2, which has not been presented in existing studies, was much lower than in Setting 1. We conclude that the proposed model is an effective tool for discriminating recurring patients and their severities, but it is difficult to detect new depressed patients. For practical application of the model, depression-specific speech regions appearing locally rather than the entire speech of depressed patients should be detected and assigned the appropriate class labels.

Keywords: audio feature; depression; classification model; correlation; graph convolutional neural network (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1660-4601/20/2/1588/pdf (application/pdf)
https://www.mdpi.com/1660-4601/20/2/1588/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:20:y:2023:i:2:p:1588-:d:1036760

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().