ATA-MSTF-Net: An Audio Texture-Aware MultiSpectro-Temporal Attention Fusion Network

Su, Yubo; Wang, Haolin; Xu, Zhihao; Yin, Chengxi; Chen, Fucheng; Wang, Zhaoguo

ATA-MSTF-Net: An Audio Texture-Aware MultiSpectro-Temporal Attention Fusion Network

Yubo Su, Haolin Wang, Zhihao Xu, Chengxi Yin, Fucheng Chen and Zhaoguo Wang ()
Additional contact information
Yubo Su: School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
Haolin Wang: Guangdong Provincial Key Laboratory of Intelligent Measurement and Advanced Metering of Power Grid, Electric Power Research Institute of CSG, Guangzhou 510530, China
Zhihao Xu: China Industrial Control Systems Cyber Emergency Response Team, Beijing 100040, China
Chengxi Yin: Guangdong Provincial Key Laboratory of Intelligent Measurement and Advanced Metering of Power Grid, Electric Power Research Institute of CSG, Guangzhou 510530, China
Fucheng Chen: School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
Zhaoguo Wang: School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China

Mathematics, 2025, vol. 13, issue 17, 1-18

Abstract: Unsupervised anomalous sound detection (ASD) models the normal sounds of machinery through classification operations, thereby identifying anomalies by quantifying deviations. Most recent approaches adopt depthwise separable modules from MobileNetV2. Extensive studies demonstrate that squeeze-and-excitation (SE) modules can enhance model fitting by dynamically weighting input features to adjust output distributions. However, we observe that conventional SE modules fail to adapt to the complex spectral textures of audio data. To address this, we propose an Audio Texture Attention (ATA) specifically designed for machine noise data, improving model robustness. Additionally, we integrate an LSTM layer and refine the temporal feature extraction architecture to strengthen the model’s sensitivity to sequential noise patterns. Experimental results on the DCASE 2020 Challenge Task 2 dataset show that our method achieves state-of-the-art performance, with AUC, pAUC, and mAUC scores of 96.15%, 90.58%, and 90.63%, respectively.

Keywords: spectro-temporal attention mechanism; anomalous sound detection; dynamic channel interaction; audio texture modeling; lightweight convolutional networks (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/17/2719/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/17/2719/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:17:p:2719-:d:1731317

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().