Analysis of imbalanced data using cost-sensitive learning
Sojin Kim and
Jongwoo Song
Communications in Statistics - Theory and Methods, 2025, vol. 54, issue 22, 7286-7300
Abstract:
Typically, classification algorithms strive to maximize the accuracy. However, when dealing with significantly imbalanced data, accuracy may not be the most suitable metric. We believe that the most effective approach for handling imbalanced cases is to minimize the total costs. Unfortunately, precise costs for misclassification are often unavailable in real-world scenarios. To address this problem, we offer a simple and efficient search algorithm for cost-sensitive learning. We also introduce a new performance metric, imbalanced data classification performance (IDCP), which combines the F-score and the area under the curve (AUC). By utilizing the imbalance ratio (IR) as a crucial factor, we use IDCP to determine optimal weights in cost-sensitive learning. Through extensive experiments, we show that our method can find the optimal decision boundary in imbalanced datasets. Our code is available at https://github.com/sssojin/Imbalanced_Classification
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/03610926.2025.2472792 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:lstaxx:v:54:y:2025:i:22:p:7286-7300
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/lsta20
DOI: 10.1080/03610926.2025.2472792
Access Statistics for this article
Communications in Statistics - Theory and Methods is currently edited by Debbie Iscoe
More articles in Communications in Statistics - Theory and Methods from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().