Speech-Based Real-World Scene Understanding for Assistive Care of the Visually Impaired
Tarun Sunil,
K. Vinod,
M. Madhav,
Joshua Abraham and
G. Jyothish Lal ()
Additional contact information
Tarun Sunil: Amrita Vishwa Vidyapeetham
K. Vinod: Amrita Vishwa Vidyapeetham
M. Madhav: Amrita Vishwa Vidyapeetham
Joshua Abraham: Amrita Vishwa Vidyapeetham
G. Jyothish Lal: Amrita Vishwa Vidyapeetham
A chapter in Machine Learning and Deep Learning Modeling and Algorithms with Applications in Medical and Health Care, 2025, pp 23-37 from Springer
Abstract:
Abstract This research introduces a new assistive technology that combines real-time image captioning, voice synthesis, and a model-based keyword-spotting method to help people with visual impairments. In order to identify specified voice instructions as the active trigger and start a camera to record the user’s environment, the system makes use of a lightweight machine learning framework. CLIP, a cutting-edge vision-language model, is used to interpret the visual input and provide contextual textual descriptions of the surroundings. Tacotron 2, a neural text-to-speech algorithm, transforms these captions into natural-sounding speech so that users may hear their environment. The end-to-end pipeline puts usability and low latency first, showing that speech-driven activation, sophisticated picture interpretation, and high-quality audio synthesis can all be combined to provide an easy-to-use assistive tool for practical uses.
Keywords: Assistive technologies; Contrastive language image pre-training; Speech-to-text; Text-to-speech (TTS); Automatic speech recognition (ASR) (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:ssrchp:978-3-031-98728-1_2
Ordering information: This item can be ordered from
http://www.springer.com/9783031987281
DOI: 10.1007/978-3-031-98728-1_2
Access Statistics for this chapter
More chapters in Springer Series in Reliability Engineering from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().