EconPapers    
Economics at your fingertips  
 

Enhancing techniques for learning decision trees from imbalanced data

Ikram Chaabane (), Radhouane Guermazi () and Mohamed Hammami ()
Additional contact information
Ikram Chaabane: MIRACL-FSEG, Sfax University
Radhouane Guermazi: Saudi Electronic University
Mohamed Hammami: MIRACL-FSS, Sfax University

Advances in Data Analysis and Classification, 2020, vol. 14, issue 3, No 9, 677-745

Abstract: Abstract Several machine learning techniques assume that the number of objects in considered classes is approximately similar. Nevertheless, in real-world applications, the class of interest to be studied is generally scarce. The data imbalance status may allow high global accuracy through most standard learning algorithms, but it poses a real challenge when considering the minority class accuracy. To deal with this issue, we introduce in this paper a novel adaptation of the decision tree algorithm to imbalanced data situations. A new asymmetric entropy measure is proposed. It adjusts the most uncertain class distribution to the a priori class distribution and involves it in the node splitting-process. Unlike most competitive split criteria, which include only the maximum uncertainty vector in their formula, the proposed entropy is customizable with an adjustable concavity to better comply with the system expectations. The experimental results across thirty-five differently class-imbalanced data-sets show significant improvements over various split criteria adapted for imbalanced situations. Furthermore, being combined with sampling strategies and based-ensemble methods, our entropy proves significant enhancements on the minority class prediction, along with a good handling of the data difficulties related to the class imbalance problem.

Keywords: Asymmetric decision trees; Imbalanced data; Entropy measures; Classification problem; Index of balanced accuracy (search for similar items in EconPapers)
Date: 2020
References: View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
http://link.springer.com/10.1007/s11634-019-00354-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:advdac:v:14:y:2020:i:3:d:10.1007_s11634-019-00354-x

Ordering information: This journal article can be ordered from
http://www.springer. ... ds/journal/11634/PS2

DOI: 10.1007/s11634-019-00354-x

Access Statistics for this article

Advances in Data Analysis and Classification is currently edited by H.-H. Bock, W. Gaul, A. Okada, M. Vichi and C. Weihs

More articles in Advances in Data Analysis and Classification from Springer, German Classification Society - Gesellschaft für Klassifikation (GfKl), Japanese Classification Society (JCS), Classification and Data Analysis Group of the Italian Statistical Society (CLADAG), International Federation of Classification Societies (IFCS)
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-05-18
Handle: RePEc:spr:advdac:v:14:y:2020:i:3:d:10.1007_s11634-019-00354-x