A prediction model to detect non-compliant taxpayers using a supervised machine learning approach: evidence from Tunisia
Aicha Kamoun,
Rahma Boujelbane and
Saoussen Boujelben
Journal of Business Analytics, 2025, vol. 8, issue 2, 116-133
Abstract:
This study aims to develop a tax non-compliance prediction model in Tunisia using supervised machine learning algorithms. A data mining analysis was conducted following the Knowledge Discovery in Databases (KDD) process, utilizing a dataset of 20,930 labeled observations from 2013 to 2017, comprising 110 attributes. We employed supervised learning algorithms, including K-Nearest Neighbors, Decision Trees, Naïve Bayes, Gradient Boosting, and Random Forest, to identify the most accurate model. Notably, Random Forest outperformed the other algorithms, achieving a prediction accuracy of 83%. Furthermore, through a combined interpretation of feature importance derived from Random Forest, SHAP value analysis, and ANOVA, our findings provide tax auditors with insights into the most influential attributes for predicting tax non-compliance. This study holds significant practical implications by enhancing the efficiency of tax audits and supporting tax authorities in their efforts to combat tax non-compliance.
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/2573234X.2024.2438195 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:tjbaxx:v:8:y:2025:i:2:p:116-133
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/tjba20
DOI: 10.1080/2573234X.2024.2438195
Access Statistics for this article
Journal of Business Analytics is currently edited by Dursan Delen
More articles in Journal of Business Analytics from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().