An ensemble model for addressing class imbalance and class overlap in software defect prediction
Abdul Waheed Dar () and
Sheikh Umar Farooq ()
Additional contact information
Abdul Waheed Dar: University of Kashmir
Sheikh Umar Farooq: University of Kashmir
International Journal of System Assurance Engineering and Management, 2024, vol. 15, issue 12, No 10, 5584-5603
Abstract:
Abstract Software defect prediction (SDP) is an important action and an emerging challenge in the process of software development that is used to increase the software quality. SDP identifies those modules of the software that are expected to contain defects, thereby helping to allocate the limited testing resources cost-efficiently so that the overall development cost is reduced. Various machine learning techniques have been utilised for developing SDP models. However, a major challenge to SDP models in identifying the software defective modules is the class imbalance problem of SDP datasets. Moreover, existing literature shows that the class overlap in imbalanced SDP datasets had a much negative impact on the prediction capability of SDP models. In this paper, we propose an effective ensemble SDP model that employs a four-stage pipeline approach to addresses both the problems of class overlap and class imbalance simultaneously. Our approach integrates the framework of class overlap reduction technique and under-sampling technique with the extreme gradient boosting classifier (XGBoost). Through this integrated approach, our model effectively handles both class overlap and class imbalance issues, providing an enhanced solution for SDP tasks. We assess the effectiveness of our proposed SDP model by comparing its performance against ten state-of-the-art SDP models using sixteen imbalanced software defect datasets. The experimental results, coupled with statistical analysis, indicate that our proposed SDP model exhibits superior predictive performance, surpassing the other ten benchmark models across various metrics such as recall, G-mean, F-measure, and AUC.
Keywords: Class imbalance problem; Class overlap problem; Machine learning; Over-sampling; Under-sampling; Software Defect Prediction (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s13198-024-02538-x Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:ijsaem:v:15:y:2024:i:12:d:10.1007_s13198-024-02538-x
Ordering information: This journal article can be ordered from
http://www.springer.com/engineering/journal/13198
DOI: 10.1007/s13198-024-02538-x
Access Statistics for this article
International Journal of System Assurance Engineering and Management is currently edited by P.K. Kapur, A.K. Verma and U. Kumar
More articles in International Journal of System Assurance Engineering and Management from Springer, The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().