An Attention-Based Framework for Detecting Face Forgeries: Integrating Efficient-ViT and Wavelet Transform

Xiao, Yinfei; Zhou, Yanbing; Cheng, Pengzhan; Ni, Leqian; Wu, Xusheng; Zheng, Tianxiang

An Attention-Based Framework for Detecting Face Forgeries: Integrating Efficient-ViT and Wavelet Transform

Yinfei Xiao, Yanbing Zhou, Pengzhan Cheng, Leqian Ni, Xusheng Wu and Tianxiang Zheng ()
Additional contact information
Yinfei Xiao: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Yanbing Zhou: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Pengzhan Cheng: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Leqian Ni: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Xusheng Wu: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China
Tianxiang Zheng: Department of E-Commerce, Jinan University (Shenzhen Campus), Shenzhen 518053, China

Mathematics, 2025, vol. 13, issue 16, 1-30

Abstract: As face forgery techniques, particularly the DeepFake method, progress, the imperative for effective detection of manipulations that enable hyper-realistic facial representations to mitigate security threats is emphasized. Current spatial domain approaches commonly encounter difficulties in generalizing across various forgery methods and compression artifacts, whereas frequency-based analyses exhibit promise in identifying nuanced local cues; however, the absence of global contexts impedes the capacity of detection methods to improve generalization. This study introduces a hybrid architecture that integrates Efficient-ViT and multi-level wavelet transform to dynamically merge spatial and frequency features through a dynamic adaptive multi-branch attention (DAMA) mechanism, thereby improving the deep interaction between the two modalities. We innovatively devise a joint loss function and a training strategy to address the imbalanced data issue and improve the training process. Experimental results on the FaceForensics++ and Celeb-DF (V2) have validated the effectiveness of our approach, attaining 97.07% accuracy in intra-dataset evaluations and a 74.7% AUC score in cross-dataset assessments, surpassing our baseline Efficient-ViT by 14.1% and 7.7%, respectively. The findings indicate that our approach excels in generalization across various datasets and methodologies, while also effectively minimizing feature redundancy through an innovative orthogonal loss that regularizes the feature space, as evidenced by the ablation study and parameter analysis.

Keywords: deepfake detection; face forgery; Efficient-ViT; wavelet transform; cross attention (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/16/2576/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/16/2576/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:16:p:2576-:d:1722772

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().