EconPapers    
Economics at your fingertips  
 

Publicly available datasets analysis and spectrogram-ResNet41 based improved features extraction for audio spoof attack detection

Nidhi Chakravarty () and Mohit Dua ()
Additional contact information
Nidhi Chakravarty: National Institute of Technology
Mohit Dua: National Institute of Technology

International Journal of System Assurance Engineering and Management, 2024, vol. 15, issue 12, No 12, 5636 pages

Abstract: Abstract The rapid expansion of voice-based technologies across diverse applications underscores the critical need for robust security measures against audio spoofing attacks. This paper comprehensively examines publicly available datasets that have been developed to detect audio spoof attacks. The research encompasses a compilation of datasets, including ASVspoof dataset series (2019, 2021), Voice Spoofing Detection Corpus (VSDC), Voice Impersonation Corpus in Hindi Language (VIHL) and DEepfake CROss-lingual evaluation dataset (DECRO), covering various spoofing attack scenarios of English, Hindi and Chinese languages. In the first part of the paper, a baseline for the proposed research work has been developed by comparing the performances of state-of-the-art baseline Linear frequency cepstral coefficient (LFCC) features with application of four different machine learning classifiers Random forest (RF), K-nearest neighbor (KNN), eXtreme gradient boosting (XGBoost), and Naïve Bayes (NB) at the backend, over these four different datasets. In second part of the proposal, we have used novel feature combination of Mel Spectrogram-Residual Network41 (ResNet41)-Linear discriminant analysis (LDA) and Gammatone Spectrogram-ResNet41-LDA, one by one, with application of same set of machine learning classifiers at the backend. The combination of Gammatone spectrogram-ResNet41-LDA along with XGBoost classifier has achieved an Equal Error Rate (EER) of 1.7, 1.28, 0.5, 0.36, 0.03, 0.07, and 0.9% for ASVspoof 2019 Logical Access (LA), ASVspoof 2019 Physical Access (PA), ASVspoof 2021 Deepfake, VSDC, DECRO English, DECRO Chinese, and VIHL datasets, respectively. Hence, the proposed research work in this paper achieves the objective of assessing the feasibility and utility of publicly available state of the art datasets for training and testing advanced algorithms in identifying manipulated audio.

Keywords: ASV; Deepfake; Synthetic attack; Replay attack; ResNet41; Spectrogram (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s13198-024-02550-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:ijsaem:v:15:y:2024:i:12:d:10.1007_s13198-024-02550-1

Ordering information: This journal article can be ordered from
http://www.springer.com/engineering/journal/13198

DOI: 10.1007/s13198-024-02550-1

Access Statistics for this article

International Journal of System Assurance Engineering and Management is currently edited by P.K. Kapur, A.K. Verma and U. Kumar

More articles in International Journal of System Assurance Engineering and Management from Springer, The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:ijsaem:v:15:y:2024:i:12:d:10.1007_s13198-024-02550-1