Evaluation of Imputation Methods in Ovarian Tumor Diagnostic Models Using Generalized Linear Models and Support Vector Machines
Ioannis Dimou,
Ben Van Calster,
Sabine Van Huffel,
Dirk Timmerman and
Michalis Zervakis
Additional contact information
Ioannis Dimou: Department of Electronics and Computer Engineering, Technical University of Crete, Chania, Greece, jdimou@gmail.com
Ben Van Calster: Department of Electrical Engineering (ESAT-SISTA), Katholieke Universiteit Leuven, Leuven, Belgium
Sabine Van Huffel: Department of Electrical Engineering (ESAT-SISTA), Katholieke Universiteit Leuven, Leuven, Belgium
Dirk Timmerman: Department of Obstetrics and Gynaecology, University Hospitals K.U. Leuven, Leuven, Belgium
Michalis Zervakis: Department of Electrical Engineering (ESAT-SISTA), Katholieke Universiteit Leuven, Leuven, Belgium
Medical Decision Making, 2010, vol. 30, issue 1, 123-131
Abstract:
Neglecting missing values in diagnostic models can result in unreliable and suboptimal performance on new data. In this study, the authors imputed missing values for the CA-125 tumor marker in a large data set of ovarian tumors that was used to develop models for predicting malignancy. Four imputation techniques were applied: regression imputation, expectation-maximization, data augmentation, and hotdeck. Models using the imputed data sets were compared with models without CA-125 to investigate the important clinical issue concerning the necessity of CA-125 information for diagnostic models and with models using only complete cases to investigate differences between imputation and complete case strategies for missing values. The models are based on Bayesian generalized linear models (GLMs) and Bayesian least squares support vector machines. Results indicate that the use of CA-125 resulted in small, clinically nonsignificant increases in the AUC of diagnostic models. Minor differences between imputation methods were observed, and imputing CA-125 resulted in minor differences in the AUC compared with complete case analysis (CCA). However, GLM parameter estimates of predictor variables often differed between CCA and models based on imputation. The authors conclude that CA-125 is not indispensable in diagnostic models for ovarian tumors and that missing value imputation is preferred over CCA.
Keywords: ovarian tumors; CA-125; imputation; complete case analysis; least squares support vector machines; Bayesian generalized linear model; AUC. (Med Decis Making 2010; 30:123—131) (search for similar items in EconPapers)
Date: 2010
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/0272989X09340579 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:sae:medema:v:30:y:2010:i:1:p:123-131
DOI: 10.1177/0272989X09340579
Access Statistics for this article
More articles in Medical Decision Making
Bibliographic data for series maintained by SAGE Publications ().