Regularized regression for two phase failure time studies
David Soave and
Jerald F. Lawless
Computational Statistics & Data Analysis, 2023, vol. 182, issue C
Abstract:
Two-phase study designs are ideal for focused sub-studies based on large prospective cohorts when the outcome of interest is an event that is rare in the full cohort, and additional covariates are expensive or difficult to measure. Researchers often wish to examine large numbers of covariates for association with outcomes of interest. In the context of cancer, hundreds to millions of genetic markers may be considered, along with environmental exposures. A computationally efficient variable selection method is proposed for two-phase failure time studies with stratified sampling under the Cox proportional hazards model. The penalized estimator is obtained from a penalized (weighted) Cox log partial likelihood using a pathwise cyclical coordinate descent algorithm which is scalable for high dimensional datasets where the number of features is much larger than the sample size (p≫n). A detailed simulation study to examine the performance of the proposed methodology is described. The variable selection and estimation procedure is then used to obtain a model for predicting acute myeloid leukaemia using somatic stem cell mutation profiles derived from blood samples, based on a two-phase sample from the European Prospective Investigation into Cancer and Nutrition (EPIC) study.
Keywords: Two-phase studies; Case-cohort; Penalized regression; Cox models (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947323000142
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:182:y:2023:i:c:s0167947323000142
DOI: 10.1016/j.csda.2023.107703
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().