Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France

Vimont, Alexandre; Leleu, Henri; Durand-Zaleski, Isabelle

Machine learning versus regression modelling in predicting individual healthcare costs from a representative sample of the nationwide claims database in France

Alexandre Vimont (), Henri Leleu and Isabelle Durand-Zaleski
Additional contact information
Alexandre Vimont: Public Health Expertise (PHE)
Henri Leleu: Public Health Expertise (PHE)
Isabelle Durand-Zaleski: Assistance Publique Hôpitaux de Paris, URC-ECO, CRESS-UMR1153

The European Journal of Health Economics, 2022, vol. 23, issue 2, No 5, 223 pages

Abstract: Abstract Background Innovative provider payment methods that avoid adverse selection and reward performance require accurate prediction of healthcare costs based on individual risk adjustment. Our objective was to compare the performances of a simple neural network (NN) and random forest (RF) to a generalized linear model (GLM) for the prediction of medical cost at the individual level. Methods A 1/97 representative sample of the French National Health Data Information System was used. Predictors selected were: demographic information; pre-existing conditions, Charlson comorbidity index; healthcare service use and costs. Predictive performances of each model were compared through individual-level (adjusted R-squared (adj-R2), mean absolute error (MAE) and hit ratio (HiR)), and distribution-level metrics on different sets of covariates in the general population and by pre-existing morbid condition, using a quasi-Monte Carlo design. Results We included 510,182 subjects alive on 31st December, 2015. Mean annual costs were 1894€ (standard deviation 9326€) (median 393€, IQ range 95€; 1480€), including zero-claim subjects. All models performed similarly after adjustment on demographics. RF model had better performances on other sets of covariates (pre-existing conditions, resource counts and past year costs). On full model, RF reached an adj-R2 of 47.5%, a MAE of 1338€ and a HiR of 67%, while GLM and NN had an adj-R2 of 34.7% and 31.6%, a MAE of 1635€ and 1660€, and a HiR of 58% and 55 M, respectively. RF model outperformed GLM and NN for most conditions and for high-cost subjects. Conclusions RF should be preferred when the objective is to best predict medical costs. When the objective is to understand the contribution of predictors, GLM was well suited with demographics, conditions and base year cost.

Keywords: Predictive analytics; Machine learning; Cost containment; Healthcare management; Healthcare costs; Random forest; Neural network (search for similar items in EconPapers)
JEL-codes: I11 I13 I15 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://link.springer.com/10.1007/s10198-021-01363-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:eujhec:v:23:y:2022:i:2:d:10.1007_s10198-021-01363-4

Ordering information: This journal article can be ordered from
http://www.springer. ... cs/journal/10198/PS2

DOI: 10.1007/s10198-021-01363-4

Access Statistics for this article

The European Journal of Health Economics is currently edited by J.-M.G.v.d. Schulenburg

More articles in The European Journal of Health Economics from Springer, Deutsche Gesellschaft für Gesundheitsökonomie (DGGÖ) Contact information at EDIRC.
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().