EconPapers    
Economics at your fingertips  
 

Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series

Liangyuan Hu () and Lihua Li
Additional contact information
Liangyuan Hu: Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, NJ 08854, USA
Lihua Li: Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA

IJERPH, 2022, vol. 19, issue 23, 1-13

Abstract: Tree-based machine learning methods have gained traction in the statistical and data science fields. They have been shown to provide better solutions to various research questions than traditional analysis approaches. To encourage the uptake of tree-based methods in health research, we review the methodological fundamentals of three key tree-based machine learning methods: random forests, extreme gradient boosting and Bayesian additive regression trees. We further conduct a series of case studies to illustrate how these methods can be properly used to solve important health research problems in four domains: variable selection, estimation of causal effects, propensity score weighting and missing data. We exposit that the central idea of using ensemble tree methods for these research questions is accurate prediction via flexible modeling. We applied ensemble trees methods to select important predictors for the presence of postoperative respiratory complication among early stage lung cancer patients with resectable tumors. We then demonstrated how to use these methods to estimate the causal effects of popular surgical approaches on postoperative respiratory complications among lung cancer patients. Using the same data, we further implemented the methods to accurately estimate the inverse probability weights for a propensity score analysis of the comparative effectiveness of the surgical approaches. Finally, we demonstrated how random forests can be used to impute missing data using the Study of Women’s Health Across the Nation data set. To conclude, the tree-based methods are a flexible tool and should be properly used for health investigations.

Keywords: causal inference; variable selection; missing data; sensitivity analysis; ensemble methods (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/1660-4601/19/23/16080/pdf (application/pdf)
https://www.mdpi.com/1660-4601/19/23/16080/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:19:y:2022:i:23:p:16080-:d:990266

Access Statistics for this article

IJERPH is currently edited by Ms. Jenna Liu

More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jijerp:v:19:y:2022:i:23:p:16080-:d:990266