EconPapers    
Economics at your fingertips  
 

Optimal sampling for positive only electronic health record data

Seong‐H. Lee, Yanyuan Ma, Ying Wei and Jinbo Chen

Biometrics, 2023, vol. 79, issue 4, 2974-2986

Abstract: Identifying a patient's disease/health status from electronic medical records is a frequently encountered task in electronic health records (EHR) related research, and estimation of a classification model often requires a benchmark training data with patients' known phenotype statuses. However, assessing a patient's phenotype is costly and labor intensive, hence a proper selection of EHR records as a training set is desired. We propose a procedure to tailor the best training subsample with limited sample size for a classification model, minimizing its mean‐squared phenotyping/classification error (MSE). Our approach incorporates “positive only” information, an approximation of the true disease status without false alarm, when it is available. In addition, our sampling procedure is applicable for training a chosen classification model which can be misspecified. We provide theoretical justification on its optimality in terms of MSE. The performance gain from our method is illustrated through simulation and a real‐data example, and is found often satisfactory under criteria beyond MSE.

Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://doi.org/10.1111/biom.13824

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bla:biomet:v:79:y:2023:i:4:p:2974-2986

Ordering information: This journal article can be ordered from
http://www.blackwell ... bs.asp?ref=0006-341X

Access Statistics for this article

More articles in Biometrics from The International Biometric Society
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-19
Handle: RePEc:bla:biomet:v:79:y:2023:i:4:p:2974-2986