Scene-dependent sound event detection based on multitask learning with deformable large kernel attention convolution
Haiyue Zhang,
Menglong Wu,
Xichang Cai and
Wenkai Liu
PLOS ONE, 2025, vol. 20, issue 5, 1-15
Abstract:
Sound event detection (SED) and acoustic scene classification (ASC) are closely related tasks in environmental sound analysis. Given the interrelationship between sound events and scenes, some previous studies have proposed using the multitask learning (MTL) method to jointly analyze SED and ASC. However, these multitask learning methods are generally based on hard parameter-sharing, which exchange sound event and scene features only through the low-level network. Such approaches are difficult to balance the complex interrelationships between SED and ASC, and limits the feature sharing and information flow between tasks during the training. To address these challenges, this study proposes a novel multitask network based on residual multi-level feature extraction (R-MFE) framework, which aims to jointly analyze SED and ASC tasks, and utilize scene information to improve the performance of sound event detection. In addition, this study designs the D-LKAC attention module, which combines the advantages of self-attention mechanisms and convolution to capture global and local features. To further enhance SED performance, this study introduces the MS-conv module, which captures audio details from multiple dimensions. The proposed MTL method is evaluated on the TUT Acoustic Scenes 2016/2017 and TUT Sound Events 2016/2017 datasets. Experimental results indicate that our approach outperforms state-of-the-art techniques, improving the F-scores by 6.44%.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0322002 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 22002&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0322002
DOI: 10.1371/journal.pone.0322002
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().