Application of Fusion of Various Spontaneous Speech Analytics Methods for Improving Far-Field Neural-Based Diarization
Sergei Astapov,
Aleksei Gusev,
Marina Volkova,
Aleksei Logunov,
Valeriia Zaluskaia,
Vlada Kapranova,
Elena Timofeeva,
Elena Evseeva,
Vladimir Kabarov and
Yuri Matveev
Additional contact information
Sergei Astapov: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Aleksei Gusev: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Marina Volkova: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Aleksei Logunov: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Valeriia Zaluskaia: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Vlada Kapranova: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Elena Timofeeva: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Elena Evseeva: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Vladimir Kabarov: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Yuri Matveev: Information Technologies and Programming Faculty, ITMO University, 197101 Saint Petersburg, Russia
Mathematics, 2021, vol. 9, issue 23, 1-21
Abstract:
Recently developed methods in spontaneous speech analytics require the use of speaker separation based on audio data, referred to as diarization. It is applied to widespread use cases, such as meeting transcription based on recordings from distant microphones and the extraction of the target speaker’s voice profiles from noisy audio. However, speech recognition and analysis can be hindered by background and point-source noise, overlapping speech, and reverberation, which all affect diarization quality in conjunction with each other. To compensate for the impact of these factors, there are a variety of supportive speech analytics methods, such as quality assessments in terms of SNR and RT60 reverberation time metrics, overlapping speech detection, instant speaker number estimation, etc. The improvements in speaker verification methods have benefits in the area of speaker separation as well. This paper introduces several approaches aimed towards improving diarization system quality. The presented experimental results demonstrate the possibility of refining initial speaker labels from neural-based VAD data by means of fusion with labels from quality estimation models, overlapping speech detectors, and speaker number estimation models, which contain CNN and LSTM modules. Such fusing approaches allow us to significantly decrease DER values compared to standalone VAD methods. Cases of ideal VAD labeling are utilized to show the positive impact of ResNet-101 neural networks on diarization quality in comparison with basic x-vectors and ECAPA-TDNN architectures trained on 8 kHz data. Moreover, this paper highlights the advantage of spectral clustering over other clustering methods applied to diarization. The overall quality of diarization is improved at all stages of the pipeline, and the combination of various speech analytics methods makes a significant contribution to the improvement of diarization quality.
Keywords: speaker diarization; spontaneous speech processing; voice activity detection; overlapping speech detection; speaker extractor models; speaker number estimation; model fusion; quality estimation; distant speech processing; artificial neural networks (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://www.mdpi.com/2227-7390/9/23/2998/pdf (application/pdf)
https://www.mdpi.com/2227-7390/9/23/2998/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:9:y:2021:i:23:p:2998-:d:685605
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().