The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes
Azra Ramezankhani,
Omid Pournik,
Jamal Shahrabi,
Fereidoun Azizi,
Farzad Hadaegh and
Davood Khalili
Medical Decision Making, 2016, vol. 36, issue 1, 137-144
Abstract:
Objective. To evaluate the impact of the synthetic minority oversampling technique (SMOTE) on the performance of probabilistic neural network (PNN), naïve Bayes (NB), and decision tree (DT) classifiers for predicting diabetes in a prospective cohort of the Tehran Lipid and Glucose Study (TLGS). Methods . Data of the 6647 nondiabetic participants, aged 20 years or older with more than 10 years of follow-up, were used to develop prediction models based on 21 common risk factors. The minority class in the training dataset was oversampled using the SMOTE technique, at 100%, 200%, 300%, 400%, 500%, 600%, and 700% of its original size. The original and the oversampled training datasets were used to establish the classification models. Accuracy, sensitivity, specificity, precision, F-measure, and Youden’s index were used to evaluated the performance of classifiers in the test dataset. To compare the performance of the 3 classification models, we used the ROC convex hull (ROCCH). Results. Oversampling the minority class at 700% (completely balanced) increased the sensitivity of the PNN, DT, and NB by 64%, 51%, and 5%, respectively, but decreased the accuracy and specificity of the 3 classification methods. NB had the best Youden’s index before and after oversampling. The ROCCH showed that PNN is suboptimal for any class and cost conditions. Conclusions. To determine a classifier with a machine learning algorithm like the PNN and DT, class skew in data should be considered. The NB and DT were optimal classifiers in a prediction task in an imbalanced medical database.
Keywords: classification; diabetes; data mining; SMOTE (search for similar items in EconPapers)
Date: 2016
References: View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://journals.sagepub.com/doi/10.1177/0272989X14560647 (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:sae:medema:v:36:y:2016:i:1:p:137-144
DOI: 10.1177/0272989X14560647
Access Statistics for this article
More articles in Medical Decision Making
Bibliographic data for series maintained by SAGE Publications ().