Open Application of Statistical and Machine Learning Models to Explore the Impact of Environmental Exposures on Health and Disease: An Asthma Use Case
Bo Lan,
Perry Haaland,
Ashok Krishnamurthy,
David B. Peden,
Patrick L. Schmitt,
Priya Sharma,
Meghamala Sinha,
Hao Xu and
Karamarie Fecho
Additional contact information
Bo Lan: UNC Highway Safety Research Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Perry Haaland: Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Ashok Krishnamurthy: Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA
David B. Peden: Division of Allergy, Immunology and Rheumatology, Center for Environmental Medicine, Asthma & Lung Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Patrick L. Schmitt: Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA
Priya Sharma: Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA
Meghamala Sinha: Oregon State University, Corvallis, OR 97331, USA
Hao Xu: Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA
Karamarie Fecho: Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA
IJERPH, 2021, vol. 18, issue 21, 1-14
Abstract:
ICEES (Integrated Clinical and Environmental Exposures Service) provides a disease-agnostic, regulatory-compliant approach for openly exposing and analyzing clinical data that have been integrated at the patient level with environmental exposures data. ICEES is equipped with basic features to support exploratory analysis using statistical approaches, such as bivariate chi-square tests. We recently developed a method for using ICEES to generate multivariate tables for subsequent application of machine learning and statistical models. The objective of the present study was to use this approach to identify predictors of asthma exacerbations through the application of three multivariate methods: conditional random forest, conditional tree, and generalized linear model. Among seven potential predictor variables, we found five to be of significant importance using both conditional random forest and conditional tree: prednisone, race, airborne particulate exposure, obesity, and sex. The conditional tree method additionally identified several significant two-way and three-way interactions among the same variables. When we applied a generalized linear model, we identified four significant predictor variables, namely prednisone, race, airborne particulate exposure, and obesity. When ranked in order by effect size, the results were in agreement with the results from the conditional random forest and conditional tree methods as well as the published literature. Our results suggest that the open multivariate analytic capabilities provided by ICEES are valid in the context of an asthma use case and likely will have broad value in advancing open research in environmental and public health.
Keywords: open data; open science; machine learning; conditional random forest; conditional tree; biostatistics; generalized linear model; asthma; epidemiology; public health (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1660-4601/18/21/11398/pdf (application/pdf)
https://www.mdpi.com/1660-4601/18/21/11398/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:18:y:2021:i:21:p:11398-:d:668141
Access Statistics for this article
IJERPH is currently edited by Ms. Jenna Liu
More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().