EconPapers    
Economics at your fingertips  
 

Gradient boosting for high-dimensional prediction of rare events

Rok Blagus and Lara Lusa

Computational Statistics & Data Analysis, 2017, vol. 113, issue C, 19-37

Abstract: In clinical research the goal is often to correctly estimate the probability of an event. For this purpose several characteristics of the patients are measured and used to develop a prediction model which can be used to predict the class membership for future patients. Ensemble classifiers are combinations of many different classifiers and they can be useful because combining a set of classifiers can result in more accurate predictions. Gradient boosting is an ensemble classifier which was shown to perform well in the setting where the number of variables exceeds the number of samples (high-dimensional data), however it has not been evaluated for the prediction of rare events. It is demonstrated that Gradient boosting suffers from severe rare events bias, correctly classifying only a small proportion of samples from the rare class. The bias can be removed by using subsampling in combination with appropriate amount of shrinkage but only for a specific number of boosting iterations and for binomial loss function. It is shown that the number of boosting iterations where the rare events bias is removed cannot be estimated efficiently from the training data when the sample size is small. Therefore several corrections for the rare events bias of Gradient boosting are proposed and evaluated by using simulated and real high-dimensional data. It is demonstrated that the proposed corrections successfully remove the rare events bias and outperform the other ensemble classifiers that were considered. Large flexibility and high interpretability of the proposed methods is also illustrated.

Keywords: Gradient boosting; Rare events bias; Regularization through shrinkage and subsampling; Ensemble classifiers; High-dimensional class-prediction (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://www.sciencedirect.com/science/article/pii/S0167947316301803
Full text for ScienceDirect subscribers only.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:eee:csdana:v:113:y:2017:i:c:p:19-37

DOI: 10.1016/j.csda.2016.07.016

Access Statistics for this article

Computational Statistics & Data Analysis is currently edited by S.P. Azen

More articles in Computational Statistics & Data Analysis from Elsevier
Bibliographic data for series maintained by Catherine Liu ().

 
Page updated 2025-03-19
Handle: RePEc:eee:csdana:v:113:y:2017:i:c:p:19-37