Fraud Detection in Healthcare Insurance Claims Using Machine Learning

Nabrawi, Eman; Alanazi, Abdullah

Fraud Detection in Healthcare Insurance Claims Using Machine Learning

Eman Nabrawi and Abdullah Alanazi ()
Additional contact information
Eman Nabrawi: Health Informatics Department, King Saud Ibn Abdulaziz University for Health Sciences, P.O. Box 3660, Riyadh 11481, Saudi Arabia
Abdullah Alanazi: Health Informatics Department, King Saud Ibn Abdulaziz University for Health Sciences, P.O. Box 3660, Riyadh 11481, Saudi Arabia

Risks, 2023, vol. 11, issue 9, 1-11

Abstract: Healthcare fraud is intentionally submitting false claims or producing misinterpretation of facts to obtain entitlement payments. Thus, it wastes healthcare financial resources and increases healthcare costs. Subsequently, fraud poses a substantial financial challenge. Therefore, supervised machine and deep learning analytics such as random forest, logistic regression, and artificial neural networks are successfully used to detect healthcare insurance fraud. This study aims to develop a health model that automatically detects fraud from health insurance claims in Saudi Arabia. The model indicates the greatest contributing factor to fraud with optimal accuracy. The labeled imbalanced dataset used three supervised deep and machine learning methods. The dataset was obtained from three healthcare providers in Saudi Arabia. The applied models were random forest, logistic regression, and artificial neural networks. The SMOT technique was used to balance the dataset. Boruta object feature selection was applied to exclude insignificant features. Validation metrics were accuracy, precision, recall, specificity, F1 score, and area under the curve (AUC). Random forest classifiers indicated policy type, education, and age as the most significant features with an accuracy of 98.21%, 98.08% precision, 100% recall, an F1 score of 99.03%, specificity of 80%, and an AUC of 90.00%. Logistic regression resulted in an accuracy of 80.36%, 97.62% precision, 80.39% recall, an F1 score of 88.17%, specificity of 80%, and an AUC of 80.20%. ANN revealed an accuracy of 94.64%, 98.00% precision, 96.08% recall, an F1 score of 97.03%, a specificity of 80%, and an AUC of 88.04%. This predictive analytics study applied three successful models, each of which yielded acceptable accuracy and validation metrics; however, further research on a larger dataset is advised.

Keywords: fraud; insurance claims; artificial neural networks (ANN); logistic regression (LR); random forest (RF); Saudi Arabia (search for similar items in EconPapers)
JEL-codes: C G0 G1 G2 G3 K2 M2 M4 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-9091/11/9/160/pdf (application/pdf)
https://www.mdpi.com/2227-9091/11/9/160/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jrisks:v:11:y:2023:i:9:p:160-:d:1233014

Access Statistics for this article

Risks is currently edited by Mr. Claude Zhang

More articles in Risks from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().