EconPapers    
Economics at your fingertips  
 

Balanced Gradient Boosting from Imbalanced Data for Clinical Outcome Prediction

Teramoto Reiji
Additional contact information
Teramoto Reiji: Bio-IT Center, NEC Corporation

Statistical Applications in Genetics and Molecular Biology, 2009, vol. 8, issue 1, 21

Abstract: In clinical outcome prediction, such as disease diagnosis and prognosis, it is often assumed that the class, e.g., disease and control, is equally distributed. However, in practice we often encounter biological or clinical data whose class distribution is highly skewed. Since standard supervised learning algorithms intend to maximize the overall prediction accuracy, a prediction model tends to show a strong bias toward the majority class when it is trained on such imbalanced data. Therefore, the class distribution should be incorporated appropriately to learn from imbalanced data. To address this practically important problem, we proposed balanced gradient boosting (BalaBoost) which reformulates gradient boosting to avoid the overfitting to the majority class and is sensitive to the minority class by making use of the equal class distribution instead of the empirical class distribution. We applied BalaBoost to cancer tissue diagnosis based on miRNA expression data, premature death prediction for diabetes patients based on biochemical and clinical variables and tumor grade prediction of renal cell carcinoma based on tumor marker expressions whose class distribution is highly skewed. Experimental results showed that BalaBoost outperformed the representative supervised learning algorithms, i.e., gradient boosting, Random Forests and Support Vector Machine. Our results led us to the conclusion that BalaBoost is promising for clinical outcome prediction from imbalanced data.

Keywords: clinical outcome; diagnosis; cancer; diabetes; renal cell carcinoma; ensemble learning; boosting; cost-sensitive learning; imbalanced data (search for similar items in EconPapers)
Date: 2009
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.2202/1544-6115.1422 (text/html)
For access to full text, subscription to the journal or payment for the individual article is required.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:20

Ordering information: This journal article can be ordered from
https://www.degruyter.com/journal/key/sagmb/html

DOI: 10.2202/1544-6115.1422

Access Statistics for this article

Statistical Applications in Genetics and Molecular Biology is currently edited by Michael P. H. Stumpf

More articles in Statistical Applications in Genetics and Molecular Biology from De Gruyter
Bibliographic data for series maintained by Peter Golla ().

 
Page updated 2025-03-19
Handle: RePEc:bpj:sagmbi:v:8:y:2009:i:1:n:20