Classification trees with soft splits optimized for ranking
Jakub Dvořák ()
Additional contact information
Jakub Dvořák: Academy of Sciences of the Czech Republic
Computational Statistics, 2019, vol. 34, issue 2, No 16, 763-786
Abstract:
Abstract We consider softening of splits in classification trees generated from multivariate numerical data. This methodology improves the quality of the ranking of the test cases measured by the AUC. Several ways to determine softening parameters are introduced and compared including softening algorithm present in the standard methods C4.5 and C5.0. In the first part of the paper, a few settings of softening determined only from ranges of training data in the tree branches are explored. The trees softened with these settings are used to study the effect of using the Laplace correction together with soft splits. In a later part we introduce methods which employ maximization of the classifier’s performance on the training set over the domain of the softening parameters. The non-linear optimization algorithm Nelder–Mead is used and various target functions are considered. The target function evaluating the AUC on the training set is compared with functions summing over training cases some transformation of the error of score. Several data sets from the UCI repository are used in experiments.
Keywords: Supervised learning; Decision trees; Scoring classifier (search for similar items in EconPapers)
Date: 2019
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00180-019-00867-1 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:compst:v:34:y:2019:i:2:d:10.1007_s00180-019-00867-1
Ordering information: This journal article can be ordered from
http://www.springer.com/statistics/journal/180/PS2
DOI: 10.1007/s00180-019-00867-1
Access Statistics for this article
Computational Statistics is currently edited by Wataru Sakamoto, Ricardo Cao and Jürgen Symanzik
More articles in Computational Statistics from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().