Modelling unbalanced catastrophic health expenditure data by using machine‐learning methods
Songul Cinaroglu
Intelligent Systems in Accounting, Finance and Management, 2020, vol. 27, issue 4, 168-181
Abstract:
This study aims to compare the performances of logistic regression and random forest classifiers in a balanced oversampling procedure for the prediction of households that will face catastrophic out‐of‐pocket (OOP) health expenditure. Data were derived from the nationally representative household budget survey collected by the Turkish Statistical Institute for the year 2012. A total of 9,987 households returned valid surveys. The data set was highly imbalanced, and the percentage of households facing catastrophic OOP health expenditure was 0.14. Balanced oversampling was performed, and 30 artificial data sets were generated with sizes of 5% and 98% of the original data size. The balanced oversampled data set provided accurate predictions, and random forest exhibited superior performance in identifying households facing catastrophic OOP health expenditure (area under the receiver operating characteristic curve, AUC = 0.8765; classification accuracy, CA = 0.7936; sensitivity = 0.7765; specificity = 0.8552; F1 = 0.7797).
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)
Downloads: (external link)
https://doi.org/10.1002/isaf.1483
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:wly:isacfm:v:27:y:2020:i:4:p:168-181
Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=1099-1174
Access Statistics for this article
More articles in Intelligent Systems in Accounting, Finance and Management from John Wiley & Sons, Ltd.
Bibliographic data for series maintained by Wiley Content Delivery ().