EconPapers    
Economics at your fingertips  
 

Classification of Imbalanced Data with Random sets and Mean-Variance Filtering

Vladimir Nikulin
Additional contact information
Vladimir Nikulin: Suncorp, Australia

International Journal of Data Warehousing and Mining (IJDWM), 2008, vol. 4, issue 2, 63-78

Abstract: Imbalanced data represent a significant problem because the corresponding classifier has a tendency to ignore patterns which have smaller representation in the training set. We propose to consider a large number of balanced training subsets where representatives from the larger pattern are selected randomly. As an outcome, the system will produce a matrix of linear regression coefficients where rows represent random subsets and columns represent features. Based on the above matrix we make an assessment of the stability of the influence of the particular features. It is proposed to keep in the model only features with stable influence. The final model represents an average of the single models, which are not necessarily a linear regression. The above model had proven to be efficient and competitive during the PAKDD-2007 Data Mining Competition.

Date: 2008
References: Add references at CitEc
Citations:

Downloads: (external link)
http://services.igi-global.com/resolvedoi/resolve. ... 4018/jdwm.2008040108 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:igg:jdwm00:v:4:y:2008:i:2:p:63-78

Access Statistics for this article

International Journal of Data Warehousing and Mining (IJDWM) is currently edited by Eric Pardede

More articles in International Journal of Data Warehousing and Mining (IJDWM) from IGI Global
Bibliographic data for series maintained by Journal Editor ().

 
Page updated 2025-03-19
Handle: RePEc:igg:jdwm00:v:4:y:2008:i:2:p:63-78