EconPapers    
Economics at your fingertips  
 

Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families

Kweku-Muata Osei-Bryson () and Kendall Giles ()
Additional contact information
Kweku-Muata Osei-Bryson: Virginia Commonwealth University
Kendall Giles: Virginia Commonwealth University

Information Systems Frontiers, 2006, vol. 8, issue 3, No 4, 195-209

Abstract: Abstract Decision tree (DT) induction is among the more popular of the data mining techniques. An important component of DT induction algorithms is the splitting method, with the most commonly used method being based on the Conditional Entropy (CE) family. However, it is well known that there is no single splitting method that will give the best performance for all problem instances. In this paper we explore the relative performance of the Conditional Entropy family and another family that is based on the Class-Attribute Mutual Information (CAMI) measure. Our results suggest that while some datasets are insensitive to the choice of splitting methods, other datasets are very sensitive to the choice of splitting methods. For example, some of the CAMI family methods may be more appropriate than the popular Gain Ratio (GR) method for datasets which have nominal predictor attributes, and are competitive with the GR method for those datasets where all predictor attributes are numeric. Given that it is never known beforehand which splitting method will lead to the best DT for a given dataset, and given the relatively good performance of the CAMI methods, it seems appropriate to suggest that splitting methods from the CAMI family should be included in data mining toolsets.

Keywords: Decision trees; Entropy; Splitting methods; Classification; Machine learning (search for similar items in EconPapers)
Date: 2006
References: View complete reference list from CitEc
Citations: View citations in EconPapers (5)

Downloads: (external link)
http://link.springer.com/10.1007/s10796-006-8779-8 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:infosf:v:8:y:2006:i:3:d:10.1007_s10796-006-8779-8

Ordering information: This journal article can be ordered from
http://www.springer.com/journal/10796

DOI: 10.1007/s10796-006-8779-8

Access Statistics for this article

Information Systems Frontiers is currently edited by Ram Ramesh and Raghav Rao

More articles in Information Systems Frontiers from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:infosf:v:8:y:2006:i:3:d:10.1007_s10796-006-8779-8