EconPapers    
Economics at your fingertips  
 

A hybrid under-sampling approach for mining unbalanced datasets: applications to banking and insurance

Madireddi Vasu and Vadlamani Ravi

International Journal of Data Mining, Modelling and Management, 2011, vol. 3, issue 1, 75-105

Abstract: In solving unbalanced classification problems, machine learning algorithms are overwhelmed by the majority class and consequently misclassify the minority class observations. Here, we propose a hybrid under-sampling approach to improve the performance of classifiers. The proposed approach first employs k-reverse nearest neighbour (kRNN) method to detect the outliers from majority class. After removing the outliers, using K-means clustering, K-clusters are selected to further reduce the influence of the majority class. Then, we employed support vector machine (SVM), logistic regression (LR), multi layer perceptron (MLP), radial basis function network (RBF), group method of data handling (GMDH), genetic programming (GP) and decision tree (J48) for classification purpose. The effectiveness of the proposed approach was demonstrated on datasets taken from insurance fraud detection and credit card churn in banking domain. Ten-fold cross validation method was used in the study. It is observed that the proposed approach improved the performance of the classifiers.

Keywords: insurance fraud detection; credit card churn prediction; data mining; unbalanced datasets; machine learning; banking; classifiers; classifier performance; k-means clustering; support vector machines; SVM; logistic regression; multilayer perceptron; radial basis function networks; RBF neural networks; GMDH; genetic programming; decision trees. (search for similar items in EconPapers)
Date: 2011
References: Add references at CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://www.inderscience.com/link.php?id=38812 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:3:y:2011:i:1:p:75-105

Access Statistics for this article

More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().

 
Page updated 2025-03-19
Handle: RePEc:ids:ijdmmm:v:3:y:2011:i:1:p:75-105