Machine Learning Applications for Predicting High-Cost Claims Using Insurance Data

Brati, Esmeralda; Braimllari, Alma; Gjeçi, Ardit

Machine Learning Applications for Predicting High-Cost Claims Using Insurance Data

Esmeralda Brati (), Alma Braimllari and Ardit Gjeçi
Additional contact information
Esmeralda Brati: Department of Statistics and Applied Informatics, Faculty of Economy, University of Tirana, 1010 Tirana, Albania
Alma Braimllari: Department of Statistics and Applied Informatics, Faculty of Economy, University of Tirana, 1010 Tirana, Albania
Ardit Gjeçi: Department of Economics and Finance, University of New York Tirana, 1000 Tirana, Albania

Data, 2025, vol. 10, issue 6, 1-22

Abstract: Insurance is essential for financial risk protection, but claim management is complex and requires accurate classification and forecasting strategies. This study aimed to empirically evaluate the performance of classification algorithms, including Logistic Regression, Decision Tree, Random Forest, XGBoost, K-Nearest Neighbors, Support Vector Machine, and Naïve Bayes to predict high insurance claims. The research analyses the variables of claims, vehicles, and insured parties that influence the classification of high-cost claims. This investigation utilizes a dataset comprising 802 observations of bodily injury claims from the motor liability portfolio of a private insurance company in Albania, covering the period from 2018 to 2024. In order to evaluate and compare the performance of the models, we employed evaluation criteria, including classification accuracy (CA), area under the curve (AUC), confusion matrix, and error rates. We found that Random Forest performs better, achieving the highest classification accuracy (CA = 0.8867, AUC = 0.9437) with the lowest error rates, followed by the XGBoost model. At the same time, logistic regression demonstrated the weakest performance. Key predictive factors in high claim classification include claim type, deferred period, vehicle brand and age of driver. These findings highlight the potential of machine learning models in improving claim classification and risk assessment and refine underwriting policy.

Keywords: insurance claim; classification; machine learning algorithms; variable importance; confusion matrix (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/10/6/90/pdf (application/pdf)
https://www.mdpi.com/2306-5729/10/6/90/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:10:y:2025:i:6:p:90-:d:1680916

Access Statistics for this article

Data is currently edited by Ms. Becky Zhang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().