EconPapers    
Economics at your fingertips  
 

On impurity functions in decision trees

Guoping Zeng

Communications in Statistics - Theory and Methods, 2025, vol. 54, issue 3, 701-719

Abstract: Impurity functions are crucial in decision trees. These functions help determine the impurity level of a node in a decision tree, guiding the splitting criteria. However, two primary ambiguities have surrounded impurity functions: (1) the question of their non negativity and (2) the debate over their concavity. In this paper, we address these uncertainties by delving into the characteristics of impurity functions. We establish that the non negativity of an impurity function is inconsequential. Through counter examples, we disprove the equivalence between an impurity function and a concave function. We identify an impurity function that is not concave and a concave function that is not an impurity function. Interestingly, we find an impurity function that results in a negative impurity reduction. Furthermore, we validate several significant properties of impurity functions. For example, we demonstrate that when an impurity function is concave, the impurity reduction remains nonnegative for multiway divisions. We also discuss the sufficient conditions for a concave function to be an impurity function. Our numerical results further indicate that a positive linear combination of the two most popular impurity functions, namely Gini Index and Entropy, may surpass the individual performance of each when applied to the well-known German credit dataset.

Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/03610926.2024.2317359 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:lstaxx:v:54:y:2025:i:3:p:701-719

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/lsta20

DOI: 10.1080/03610926.2024.2317359

Access Statistics for this article

Communications in Statistics - Theory and Methods is currently edited by Debbie Iscoe

More articles in Communications in Statistics - Theory and Methods from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:lstaxx:v:54:y:2025:i:3:p:701-719