Vision Transformer-Based Audio Analysis for Depression Detection: A Human Factor in Reliable CPS
Vura Abhinav,
Bhaswanth Reddy Indukuri,
M. S. Karthik,
Sai Praneeth Reddy Alavalapati,
Ramisetty Lakshmi Venkat and
G. Jyothish Lal ()
Additional contact information
Vura Abhinav: Amrita Vishwa Vidyapeetham, Amrita School of Artificial Intelligence
Bhaswanth Reddy Indukuri: Amrita Vishwa Vidyapeetham, Amrita School of Artificial Intelligence
M. S. Karthik: Amrita Vishwa Vidyapeetham, Amrita School of Artificial Intelligence
Sai Praneeth Reddy Alavalapati: Amrita Vishwa Vidyapeetham, Amrita School of Artificial Intelligence
Ramisetty Lakshmi Venkat: Amrita Vishwa Vidyapeetham, Amrita School of Artificial Intelligence
G. Jyothish Lal: Amrita Vishwa Vidyapeetham, Amrita School of Artificial Intelligence
A chapter in Reliability in Cyber-Physical Systems: The Human Factor Perspective, 2026, pp 65-81 from Springer
Abstract:
Abstract Cyber-Physical Systems (CPS) are evolving beyond industrial automation to create responsive, human-centric environments that can perceive and adapt to human states. This chapter presents a critical application within this paradigm: the real-time detection of depression through ambient auditory sensing. We propose an AI-driven system that forms a key component of a human-in-the-loop CPS for mental wellness. The system’s physical interface leverages microphones to non-intrusively capture vocal patterns. On the cyber side, a sophisticated signal processing pipeline converts audio into Mel-Frequency Cepstral Coefficients (MFCCs). These features are then fed into an innovative Vision Transformer (ViT) architecture, which excels at identifying subtle, long-range dependencies in the data indicative of depressive states. Validated on the challenging DAIC-WOZ dataset, our model demonstrates state-of-the-art performance with over 96% accuracy. The significance of this research lies in its system-level relevance for CPS. It provides a validated proof-of-concept for technology that can enable continuous, objective, and passive mental health monitoring, paving the way for proactive interventions and truly intelligent assistive systems in clinical and domestic settings.
Keywords: Mental health monitoring; Depression detection; Human-centric CPS; Vision transformer; DAIC-WOZ dataset; AI for healthcare (search for similar items in EconPapers)
Date: 2026
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:ssrchp:978-3-032-09917-4_4
Ordering information: This item can be ordered from
http://www.springer.com/9783032099174
DOI: 10.1007/978-3-032-09917-4_4
Access Statistics for this chapter
More chapters in Springer Series in Reliability Engineering from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().