Noise peeling methods to improve boosting algorithms
Waldyn Martinez and
J. Brian Gray
Computational Statistics & Data Analysis, 2016, vol. 93, issue C, 483-497
Abstract:
Boosting refers to a family of methods that combine sequences of individual classifiers into highly accurate ensemble models through weighted voting. AdaBoost, short for “Adaptive Boosting”, is the most well-known boosting algorithm. AdaBoost has many strengths. Among them, there is sufficient empirical evidence pointing to its performance being generally superior to that of individual classifiers. In addition, even when combining a large number of weak learners, AdaBoost can be very robust to overfitting usually with lower generalization error than other competing ensemble methodologies, such as bagging and random forests. However, AdaBoost, as most hard margin classifiers, tends to be sensitive to outliers and noisy data, since it assigns observations that have been misclassified a higher weight in subsequent iterations. It has recently been proven that for any booster with a potential convex loss function, and any nonzero random classification noise rate, there is a data set, which can be efficiently learnable by the booster if there is no noise, but cannot be learned with accuracy better than 1/2 with random classification noise present. Several techniques to identify and potentially delete (peel) noisy samples in binary classification are proposed in order to improve the performance of AdaBoost. It is found that peeling methods generally perform better than AdaBoost and other noise resistant boosters, especially when high levels of noise are present in the data.
Keywords: AdaBoost; Ensembles; Generalization error; Noise detection; Noise filtering; Outliers (search for similar items in EconPapers)
Date: 2016
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947315001486
Full text for ScienceDirect subscribers only.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:93:y:2016:i:c:p:483-497
DOI: 10.1016/j.csda.2015.06.010
Access Statistics for this article
Computational Statistics & Data Analysis is currently edited by S.P. Azen
More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().