EconPapers    
Economics at your fingertips  
 

The data sampling effect on financial distress prediction by single and ensemble learning techniques

Kuen-Liang Sue, Chih-Fong Tsai and Andy Chiu

Communications in Statistics - Theory and Methods, 2023, vol. 52, issue 12, 4344-4355

Abstract: Financial distress domain problem datasets are usually class imbalanced. In literature, data sampling is one of the widely used solutions to deal with the class imbalance problem. This article focuses on examining the data sampling effect on financial distress prediction models by single and ensemble learning techniques. The experimental datasets are based on three bankruptcy prediction and credit scoring datasets and twelve different single classifiers and classifier ensembles are constructed. We find that although some prediction models trained by the original class imbalanced datasets provide reasonable AUC, their type II errors are very high for the practical usage. However, when data sampling is performed over the datasets, all of the prediction models can slightly increase their AUC and largely reduce their type II errors. More specifically, the decision tree ensembles by bagging and boosting methods are the better choices for financial distress prediction.

Date: 2023
References: Add references at CitEc
Citations:

Downloads: (external link)
http://hdl.handle.net/10.1080/03610926.2021.1992439 (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:taf:lstaxx:v:52:y:2023:i:12:p:4344-4355

Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/lsta20

DOI: 10.1080/03610926.2021.1992439

Access Statistics for this article

Communications in Statistics - Theory and Methods is currently edited by Debbie Iscoe

More articles in Communications in Statistics - Theory and Methods from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().

 
Page updated 2025-03-20
Handle: RePEc:taf:lstaxx:v:52:y:2023:i:12:p:4344-4355