Predicting High-Cost Health Insurance Members through Boosted Trees and Oversampling: An Application Using the HCCI Database
Brian Hartman,
Rebecca Owen and
Zoe Gibbs
North American Actuarial Journal, 2020, vol. 25, issue 1, 53-61
Abstract:
Using the Health Care Cost Institute data (approximately 47 million members over seven years), we examine how to best predict which members will be high-cost next year. We find that cost history, age, and prescription drug coverage all predict high costs, with cost history being by far the most predictive. We also compare the predictive accuracy of logistic regression to extreme gradient boosting (XGBoost) and find that the added flexibility of the extreme gradient boosting improves the predictive power. Finally, we show that with extremely unbalanced classes (because high-cost members are so rare), oversampling the minority class provides a better XGBoost predictive model than undersampling the majority class or using the training data as is. Logistic regression performance seems unaffected by the method of sampling.
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/10920277.2020.1754242 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:uaajxx:v:25:y:2020:i:1:p:53-61
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/uaaj20
DOI: 10.1080/10920277.2020.1754242
Access Statistics for this article
North American Actuarial Journal is currently edited by Kathryn Baker
More articles in North American Actuarial Journal from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().