Predicting High-Cost Healthcare Utilization Using Machine Learning: A Multi-Service Risk Stratification Analysis in EU-Based Private Group Health Insurance

Seyam, Eslam Abdelhakim

Predicting High-Cost Healthcare Utilization Using Machine Learning: A Multi-Service Risk Stratification Analysis in EU-Based Private Group Health Insurance

Eslam Abdelhakim Seyam ()
Additional contact information
Eslam Abdelhakim Seyam: Department of Insurance and Risk Management, College of Business, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 13318, Saudi Arabia

Risks, 2025, vol. 13, issue 7, 1-19

Abstract: Healthcare cost acceleration and resource allocation issues have worsened across European health systems, where a small group of patients drives excessive healthcare spending. The prediction of high-cost utilization patterns is important for the sustainable management of healthcare and focused intervention measures. The aim of our study was to derive and validate machine learning algorithms for high-cost healthcare utilization prediction based on detailed administrative data and by comparing three algorithmic methods for the best risk stratification performance. The research analyzed extensive insurance beneficiary records which compile data from health group collective funds operated by non-life insurers across EU countries, across multiple service classes. The definition of high utilization was equivalent to the upper quintile of overall health expenditure using a moderate cost threshold. The research applied three machine learning algorithms, namely logistic regression using elastic net regularization, the random forest, and support vector machines. The models used a comprehensive set of predictor variables including demographics, policy profiles, and patterns of service utilization across multiple domains of healthcare. The performance of the models was evaluated using the standard train–test methodology and rigorous cross-validation procedures. All three models demonstrated outstanding discriminative ability by achieving area under the curve values at near-perfect levels. The random forest achieved the best test performance with exceptional metrics, closely followed by logistic regression with comparable exceptional performance. Service diversity proved to be the strongest predictor across all models, while dentistry services produced an extraordinarily high odds ratio with robust confidence intervals. The group of high utilizers comprised approximately one-fifth of the sample but demonstrated significantly higher utilization across all service classes. Machine learning algorithms are capable of classifying patients eligible for the high utilization of healthcare services with nearly perfect discriminative ability. The findings justify the application of predictive analytics for proactive case management, resource planning, and focused intervention measures across private group health insurance providers in EU countries.

Keywords: healthcare utilization prediction; machine learning; risk stratification; private health insurance analytics; predictive modeling; cost containment (search for similar items in EconPapers)
JEL-codes: C G0 G1 G2 G3 K2 M2 M4 (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-9091/13/7/133/pdf (application/pdf)
https://www.mdpi.com/2227-9091/13/7/133/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jrisks:v:13:y:2025:i:7:p:133-:d:1697454

Access Statistics for this article

Risks is currently edited by Mr. Claude Zhang

More articles in Risks from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().