A machine learning-based approach for estimating and testing associations with multivariate outcomes
Benkeser David (),
Mertens Andrew (),
Colford John M. (),
Hubbard Alan (),
Arnold Benjamin F. (),
Stein Aryeh () and
J. van der Laan Mark ()
Additional contact information
Benkeser David: Emory University, School of Public Health, Atlanta, 30322, USA
Mertens Andrew: Department of Epidemiology, University of California, Berkeley, Berkeley, USA
Colford John M.: Department of Epidemiology, University of California, Berkeley, Berkeley, USA
Hubbard Alan: Department of Biostatistics, University of California, Berkeley, Berkeley, USA
Arnold Benjamin F.: Francis I. Proctor Foundation, University of California, San Fransisco, USA
Stein Aryeh: Hubert Department of Global Health, Emory University Rollins School of Public Health, Atlanta, USA
J. van der Laan Mark: Department of Biostatistics, University of California, Berkeley, Berkeley, USA
The International Journal of Biostatistics, 2021, vol. 17, issue 1, 7-21
Abstract:
We propose a method for summarizing the strength of association between a set of variables and a multivariate outcome. Classical summary measures are appropriate when linear relationships exist between covariates and outcomes, while our approach provides an alternative that is useful in situations where complex relationships may be present. We utilize machine learning to detect nonlinear relationships and covariate interactions and propose a measure of association that captures these relationships. A hypothesis test about the proposed associative measure can be used to test the strong null hypothesis of no association between a set of variables and a multivariate outcome. Simulations demonstrate that this hypothesis test has greater power than existing methods against alternatives where covariates have nonlinear relationships with outcomes. We additionally propose measures of variable importance for groups of variables, which summarize each groups’ association with the outcome. We demonstrate our methodology using data from a birth cohort study on childhood health and nutrition in the Philippines.
Keywords: canonical correlation; epidemiology; machine learning; multivariate outcomes; variable importance (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://doi.org/10.1515/ijb-2019-0061 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:bpj:ijbist:v:17:y:2021:i:1:p:7-21:n:7
Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/ijb/html
DOI: 10.1515/ijb-2019-0061
Access Statistics for this article
The International Journal of Biostatistics is currently edited by Antoine Chambaz, Alan E. Hubbard and Mark J. van der Laan
More articles in The International Journal of Biostatistics from De Gruyter
Bibliographic data for series maintained by Peter Golla ().