An efficient randomised sphere cover classifier
Reda Younsi and
Anthony Bagnall
International Journal of Data Mining, Modelling and Management, 2012, vol. 4, issue 2, 156-171
Abstract:
This paper describes an efficient randomised sphere cover classifier (αRSC), that reduces the training data set size without loss of accuracy when compared to nearest neighbour classifiers. The motivation for developing this algorithm is the desire to have a non-deterministic, fast, instance-based classifier that performs well in isolation but is also ideal for use with ensembles. We use 24 benchmark datasets from UCI repository and six gene expression datasets for evaluation. The first set of experiments demonstrate the basic benefits of sphere covering. The second set of experiments demonstrate that when we set the α parameter through cross validation, the resulting αRSC algorithm outperforms several well known classifiers when compared using the Friedman rank sum test. Thirdly, we test the usefulness of αRSC when used with three feature filtering filters on six gene expression datasets. Finally, we highlight the benefits of pruning with a bias/variance decomposition.
Keywords: sphere covers; randomised classifiers; randomisation; bias decomposition; variance decomposition; gene expression datasets; training data; set sizes; accuracy; nearest neighbour classifiers; algorithms; non-deterministic classifiers; fast classifiers; instance-based classifiers; isolation; ensembles; UCI Machine Learning Repository; University of California; UC Irvine; universities; higher education; USA; United States; benchmark datasets; gene expression datasets; sphere covering; cross validation; rank sum tests; non-parametric tests; statistical tests; Milton Friedman; feature filtering; filters; pruning; data mining; data modelling; data management; intelligent data analysis. (search for similar items in EconPapers)
Date: 2012
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=46808 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijdmmm:v:4:y:2012:i:2:p:156-171
Access Statistics for this article
More articles in International Journal of Data Mining, Modelling and Management from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().