EconPapers    
Economics at your fingertips  
 

Supervised learning approaches and feature selection - a case study in diabetes

Yugowati Praharsi, Shaou-Gang Miaou and Hui-Ming Wee

International Journal of Data Analysis Techniques and Strategies, 2013, vol. 5, issue 3, 323-337

Abstract: Data description and classification are important tasks in supervised learning. In this study, three supervised learning methods such as k-nearest neighbour (k-NN), support vector data description (SVDD) and support vector machine (SVM) are considered because they do not suffer from the problem of introducing a new class. The data sample chosen is Pima Indians diabetes. The results show that feature selection based on mean information gain and a standard deviation threshold can be considered as a substitute for forward selection. This indicates that data variation using information gain is an important factor that must be considered in selecting feature subset. Finally, among eight candidate features, glucose level is the most prominent feature for diabetes detection in all classifiers and feature selection methods under consideration. Relevancy measurement in information gain can sort out the most important feature to the least significant one. It can be very useful in medical applications such as defining feature prioritisation for symptom recognition.

Keywords: supervised learning; k-nearest neighbour; k-NN; support vector data description; SVDD; support vector machines; SVM; classification; feature selection; glucose level; diabetes detection; feature prioritisation; symptom recognition. (search for similar items in EconPapers)
Date: 2013
References: Add references at CitEc
Citations:

Downloads: (external link)
http://www.inderscience.com/link.php?id=55346 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:injdan:v:5:y:2013:i:3:p:323-337

Access Statistics for this article

More articles in International Journal of Data Analysis Techniques and Strategies from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().

 
Page updated 2025-03-19
Handle: RePEc:ids:injdan:v:5:y:2013:i:3:p:323-337