EconPapers    
Economics at your fingertips  
 

Analytical Split Value Calculation for Numerical Attributes in Hoeffding Trees with Misclassification-Based Impurity

Mehran Mirkhan (), Maryam Amir Haeri () and Mohammad Reza Meybodi ()
Additional contact information
Mehran Mirkhan: Amirkabir University of Technology
Maryam Amir Haeri: Amirkabir University of Technology
Mohammad Reza Meybodi: Amirkabir University of Technology

Annals of Data Science, 2021, vol. 8, issue 3, No 10, 645-665

Abstract: Abstract Hoeffding tree is a method to incrementally build decision trees. A common approach to handle numerical attributes in Hoeffding trees is to represent their sufficient statistics as Gaussian distributions. Our contribution in this paper is to prove that by using Gaussian distribution as sufficient statistics and misclassification error as impurity measure, there is an analytical method to exactly calculate the best splitting values. Three different approaches for using this theorem are proposed and all three are tested on both synthetic and real datasets. The experiments suggest that this approach can create smaller trees and learn faster and achieve higher accuracy in most problems.

Keywords: Hoeffding tree; Gaussian distribution; Misclassification error; Massive and streaming data (search for similar items in EconPapers)
Date: 2021
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s40745-019-00225-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:aodasc:v:8:y:2021:i:3:d:10.1007_s40745-019-00225-4

Ordering information: This journal article can be ordered from
https://www.springer ... gement/journal/40745

DOI: 10.1007/s40745-019-00225-4

Access Statistics for this article

Annals of Data Science is currently edited by Yong Shi

More articles in Annals of Data Science from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-20
Handle: RePEc:spr:aodasc:v:8:y:2021:i:3:d:10.1007_s40745-019-00225-4