Tailoring Bayesian Additive Regression Trees (BART) for environmental mixture studies
Kaizong Ye,
Zhen Chen and
Shanshan Zhao
PLOS ONE, 2026, vol. 21, issue 5, 1-23
Abstract:
Background: Various methods have been developed to investigate the complex and collective effects of environmental mixtures on human health. Tree ensemble methods, such as Bayesian Additive Regression Trees (BART), are known for their stability and accuracy in variable selection and outcome prediction for high-dimensional correlated data in the statistical literature, but their use has not been well studied for environmental mixtures. Methods: We tailored the original BART model for environmental mixtures analysis to achieve both robust identification of toxic agents and accurate prediction of health outcomes. Our modified BART approach allowed for a smooth response surface and incorporated covariate adjustment for both continuous and binary outcomes. It supported both component-wise variable selection and hierarchical variable selection to accommodate scientifically meaningful groupings of chemicals. To facilitate interpretation, we used a Generalized Additive Model (GAM) approximation to quantify the marginal contributions of individual chemicals. The performance of the modified BART was evaluated through simulations and a case study with the National Health and Nutrition Examination Survey (NHANES) 2001–2002 data to examine the effects of persistent organic pollutants (POPs) on leukocyte telomere length. All results were compared with the Bayesian Kernel Machine Regression (BKMR), a widely used method in mixtures analysis. Results: Our simulation studies demonstrated that the modified BART produced results comparable to or superior to BKMR in recovering the true exposure-response surface for both continuous and binary outcomes, with R2 consistently above 0.7. Specifically, when chemical groups were considered, modified BART with hierarchical variable selection achieved higher R2 (0.82–0.99 for continuous outcomes and 0.73–0.95 for binary outcomes) than BKMR (0.59–0.67 and 0.47–0.59, respectively), on independent test datasets. Modified BART also reduced the computational time by 70% to 99.8% compared to BKMR. Both methods effectively identified relevant chemical groups under hierarchical variable selection, but modified BART more effectively distinguished important components within groups. In the NHANES case study, three chemicals, including 2,3,4,7,8-pncdf, PCB126 and PCB169, were identified by modified BART as having near-linear positive effects on leukocyte telomere length based on GAM approximation plots. Conclusions: Modified BART is a robust and scalable response surface model alternative to BKMR for analyzing environmental mixtures data. It is particularly advantageous for large datasets, binary outcomes, and grouped chemicals. GAM approximation provides practical insights into interpreting individual chemical effect estimated from complex response surface models.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0348002 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 48002&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0348002
DOI: 10.1371/journal.pone.0348002
Access Statistics for this article
More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().