Semi-supervised approach to event time annotation using longitudinal electronic health records
Liang Liang,
Jue Hou,
Hajime Uno,
Kelly Cho,
Yanyuan Ma and
Tianxi Cai ()
Additional contact information
Liang Liang: Harvard T. H. Chan School of Public Health
Jue Hou: Harvard T. H. Chan School of Public Health
Hajime Uno: Dana-Farber Cancer Institute
Kelly Cho: Massachusetts Veterans Epidemiology Research and Information Center, US Department of Veteran Affairs
Yanyuan Ma: Penn State University
Tianxi Cai: Harvard T. H. Chan School of Public Health
Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, 2022, vol. 28, issue 3, No 5, 428-491
Abstract:
Abstract Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-n consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.
Keywords: Censoring; Electronic health records; Functional principle component analysis; Point process; Proportional odds model; Semi-supervised learning; More (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s10985-022-09557-5 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:lifeda:v:28:y:2022:i:3:d:10.1007_s10985-022-09557-5
Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10985
DOI: 10.1007/s10985-022-09557-5
Access Statistics for this article
Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data is currently edited by Mei-Ling Ting Lee
More articles in Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().