Big-Delay Estimation for Speech Separation in Assisted Living Environments
Swarnadeep Bagchi and
Ruairí de Fréin ()
Additional contact information
Swarnadeep Bagchi: School of Electrical and Electronic Engineering, Technological University Dublin, D07 EWV4 Dublin, Ireland
Ruairí de Fréin: School of Electrical and Electronic Engineering, Technological University Dublin, D07 EWV4 Dublin, Ireland
Future Internet, 2025, vol. 17, issue 4, 1-27
Abstract:
Phase wraparound due to large inter-sensor spacings in multi-channel demixing renders the DUET and AdRess source separation algorithms—known for their low computational complexity and effective speech demixing performance—unsuitable for hearing-assisted living applications, where such configurations are needed. DUET is limited to relative delays of up to 7 samples, given a sampling rate of F s = 16 kHz in anechoic scenarios, while the AdRess algorithm is constrained to instantaneous mixing problems. The task of this paper is to improve the performance of DUET-type time–frequency (TF) masks when microphones are placed far apart. A significant challenge in assistive hearing scenarios is phase wraparound caused by large relative delays. We evaluate the performance of a large relative delay estimation method, called the Elevatogram, in the presence of significant phase wraparound. We present extensions of DUET and AdRess, termed Elevato-DUET and Elevato-AdRess, which are effective in scenarios with relative delays of up to 200 samples. The findings demonstrate that Elevato-AdRess not only outperforms Elevato-DUET in terms of objective separation quality metrics—BSS_Eval and PEASS—but also achieves higher intelligibility scores, as measured by the Perceptual Evaluation of Speech Quality (PESQ) Mean Opinion Score (MOS) scores. These findings suggest that the phase wraparound limitations of DUET and AdRess algorithms in assistive hearing scenarios involving large inter-microphone spacing can be addressed by introducing the Elevatogram-based Elevato-DUET and Elevato-AdRess algorithms. These algorithms improve separation quality and intelligibility, with Elevato-AdRess demonstrating the best overall performance.
Keywords: assisted living (AL); relative delay estimation; source separation (SS); time–frequency (TF); binary mask; interaural phase difference (IPD); interaural intensity difference (IID); remote microphone (RM); windowed-disjoint orthogonal (WDO); relative transfer function (RTF); single source point (SSP) (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1999-5903/17/4/184/pdf (application/pdf)
https://www.mdpi.com/1999-5903/17/4/184/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:17:y:2025:i:4:p:184-:d:1639236
Access Statistics for this article
Future Internet is currently edited by Ms. Grace You
More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().