Winsorize tree algorithm for handling outlier in classification problem
Chee Keong Ch'ng and
Nor Idayu Mahat
International Journal of Operational Research, 2020, vol. 38, issue 2, 278-293
Abstract:
Classification and regression tree (CART) has been widely used nowadays for providing users supports in classification and prediction. However, having outlier in database is inevitable and could affect the size and accuracy of the tree. Negligence in handling the outlier could affect the splitting point which yields to bias and inaccurate tree. In this paper, we propose a winsorize tree algorithm for detecting and handling the outlier before calculating gini index measurement in all non-terminal nodes. As such, the constructed tree will grow without the necessity to be pruned. For evaluation, the proposed approach was compared to classical tree and pruned tree. The results obtained from seven real datasets indicate that the proposed winsorize tree performs as good as or even better compare to the other investigated trees.
Keywords: winsorize tree algorithm; gini index; error rate; classification; outlier; classification and regression tree; winsorized tree. (search for similar items in EconPapers)
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
http://www.inderscience.com/link.php?id=107073 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ids:ijores:v:38:y:2020:i:2:p:278-293
Access Statistics for this article
More articles in International Journal of Operational Research from Inderscience Enterprises Ltd
Bibliographic data for series maintained by Sarah Parker ().