Variable selection-combined causal mediation analysis for continuous treatments with application to large-dimensional biomedical data
Yajing Zhou,
Kecheng Wei,
Yahang Liu,
Zhaoyang Li,
Chen Huang,
Guoyou Qin and
Yongfu Yu
PLOS Computational Biology, 2026, vol. 22, issue 6, 1-31
Abstract:
Substantial progress has been made in the area of causal inference utilizing large-scale data, among which the estimation of causal mediation effects has attracted a lot of attention. However, existing large-dimensional causal inference primarily focuses on total effects or typical causal mediation effects under binary variable settings, placing less emphasis on large-scale covariate selection with continuous treatment and mediator. To address this, we propose a weighted semiparametric estimation framework that integrates the generalized outcome-adaptive LASSO method into generalized propensity score modeling to achieve estimation of causal mediation effects under continuous variable settings. Simulation results show that our proposed method outperforms other regularization-based methods in selection accuracy and estimation efficiency, which is achieved by incorporating outcome-related key variables and excluding noise covariates. From the perspective of achieving a stable balance between efficiency and bias, as well as high-dimensional information filtering, our method may serve as a compelling alternative that balances estimation efficiency with model interpretability and inferential robustness. We further conduct a real-world application based on the UK Biobank database, quantifying the causal mediation effects of apolipoprotein B levels within the association between potential diabetes risk and cancer incidence using large-scale healthcare and medical data.Author summary: Disease development and progress are well recognized to be influenced by multiple factors, and exploring the causal mediation effects of the mediator in the exposure-outcome association can help reveal the etiological mechanisms. Due to the widespread application of large-scale biology and health data, it is challenging to precisely select all important variables based on prior knowledge to obtain accurate estimates. In this study, we propose a generalized outcome-adaptive LASSO (GOAL)-combined weighted semiparametric approach to estimate the natural direct and indirect effects of continuous treatment and mediator in large-scale covariate settings. Our method extends previous work by allowing for accurate causal mediation estimates for continuous treatment and mediator with large-dimensional covariates, and also improves estimation efficiency by precisely incorporating outcome-related variables. We apply the proposed method to investigate the mediating role of apolipoprotein B in the association between potential diabetes risk and cancer incidence under extensive candidate covariates from biomedical and healthcare data.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1014436 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 14436&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1014436
DOI: 10.1371/journal.pcbi.1014436
Access Statistics for this article
More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().