Predicting High-Cost Health Insurance Members through Boosted Trees and Oversampling: An Application Using the HCCI Database

Hartman, Brian; Owen, Rebecca; Gibbs, Zoe

Predicting High-Cost Health Insurance Members through Boosted Trees and Oversampling: An Application Using the HCCI Database

Brian Hartman, Rebecca Owen and Zoe Gibbs

North American Actuarial Journal, 2020, vol. 25, issue 1, 53-61

Abstract: Using the Health Care Cost Institute data (approximately 47 million members over seven years), we examine how to best predict which members will be high-cost next year. We find that cost history, age, and prescription drug coverage all predict high costs, with cost history being by far the most predictive. We also compare the predictive accuracy of logistic regression to extreme gradient boosting (XGBoost) and find that the added flexibility of the extreme gradient boosting improves the predictive power. Finally, we show that with extremely unbalanced classes (because high-cost members are so rare), oversampling the minority class provides a better XGBoost predictive model than undersampling the majority class or using the training data as is. Logistic regression performance seems unaffected by the method of sampling.

Date: 2020
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://hdl.handle.net/10.1080/10920277.2020.1754242 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:uaajxx:v:25:y:2020:i:1:p:53-61

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/uaaj20

DOI: 10.1080/10920277.2020.1754242

Access Statistics for this article

North American Actuarial Journal is currently edited by Kathryn Baker

More articles in North American Actuarial Journal from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().