EconPapers    
Economics at your fingertips  
 

Advanced Tax Fraud Detection: A Soft-Voting Ensemble Based on GAN and Encoder Architecture

Masad A. Alrasheedi, Samia Ijaz (), Ayed M. Alrashdi and Seung-Won Lee ()
Additional contact information
Masad A. Alrasheedi: Department of Management Information Systems, College of Business Administration, Taibah University, Al-Madinah Al-Munawara 42353, Saudi Arabia
Samia Ijaz: Department of Computer Science, HITEC University, Taxila 47080, Pakistan
Ayed M. Alrashdi: Department of Electrical Engineering, College of Engineering, University of Ha’il, Ha’il 81441, Saudi Arabia
Seung-Won Lee: Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea

Mathematics, 2025, vol. 13, issue 4, 1-29

Abstract: The world prevalence of the two types of authorized and fraudulent transactions makes it difficult to distinguish between the two operations. The small percentage of fraudulent transactions, in turn, gives rise to the class imbalance problem. Hence, an adequately robust fraud detection mechanism must exist for tax systems to avoid their collapse. It has become significantly difficult to obtain any dataset, specifically a tax return dataset, because of the rising importance of privacy in a society where people generally feel squeamish about sharing personal information. Because of this, we arrive at the decision to synthesize our dataset by employing publicly available data, as well as enhance them through Correlational Generative Adversarial Networks (CGANs) and the Synthetic Minority Oversampling Technique (SMOTE). The proposed method includes a preprocessing stage to denoise the data and identify anomalies, outliers, and dimensionality reduction. Then the data have undergone enhancement using the SMOTE and the proposed CGAN techniques. A unique encoder design has been proposed, which serves the purpose of exposing the hidden patterns among legitimate and fraudulent records. This research found anomalous deductions, income inconsistencies, recurrent transaction manipulations, and irregular filing practices that distinguish fraudulent from valid tax records. These patterns are identified by encoder-based feature extraction and synthetic data augmentation. Several machine learning classifiers, along with a voting ensemble technique, have been used both with and without data augmentation. Experimental results have shown that the proposed Soft-Voting technique outperformed the original without an ensemble method.

Keywords: soft-voting ensemble; tax evasion; synthetic dataset; fraud detection; SMOTE; GAN; financial statement (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/4/642/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/4/642/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:4:p:642-:d:1592301

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-22
Handle: RePEc:gam:jmathe:v:13:y:2025:i:4:p:642-:d:1592301