An -Improved Intelligent Model for Software Defect Prediction

Ahmed, Mahmoud Abdelmohsen; Azab, Shahira Shaaban; Hefny, Hesham Ahmed

An -Improved Intelligent Model for Software Defect Prediction

Mahmoud Abdelmohsen Ahmed, Shahira Shaaban Azab and Hesham Ahmed Hefny
Additional contact information
Mahmoud Abdelmohsen Ahmed: Department of Computer Science, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt
Shahira Shaaban Azab: Department of Computer Science, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt
Hesham Ahmed Hefny: Department of Computer Science, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt

Scientific Review, 2026, vol. 12, issue 1, 6-22

Abstract: Software defect prediction is an important activity in every software firms, so the Software defects have severe consequences, especially in mission-critical arrangements developed by organizations like NASA. Effective techniques for early detection and prediction of defective software modules are crucial for ensuring reliability and qualifying risks. This study investigates the application of machine learning models for predicting software defects using datasets from NASAâ€™s Metrics Data Program depending on 10 datasets. Four different classification models â€“ Support Vector Machines, Random Forests, Logistic Regression, and Ensemble model â€“ were evaluated on their ability to classify software modules as defective or non-defective based on software metrics. The datasets exhibited significant class imbalance, with defective modules being the minority class. To address this, the Synthetic Minority Over-sampling Technique was employed, which generated synthetic examples of the minority class, leading to improved performance across all models. Also, two feature selection procedures, Recursive Feature Elimination with Cross-Validation and Information Gain, were applied and compared. RFECV generally resulted in higher accuracy and precision, while the results for recall and F1-score were mixed. Among the assessed models, the Random Forest model demonstrated the highest overall accuracy after applying SMOTE and feature selection. The research highlights the potential of machine learning, particularly ensemble methods like Random Forests, for automating software defect prediction in critical systems. By addressing trials such as class imbalance and feature selection, the performance of these models can be significantly enhanced. This study contributes to the rising field of machine learning applications in software engineering, providing insights and methods for improving the reliability and quality of software systems developed by NASA and other organizations working on mission-critical application.

Keywords: Software Defect Prediction (SDP); Ensemble Model; Feature Engineering, Recursive Feature Elimination (RFE); Support vector machines (SVMs); Information Gain (IG). (search for similar items in EconPapers)
Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.arpgweb.com/pdf-files/sr12(1)6-22.pdf (application/pdf)
https://www.arpgweb.com/journal/10/archive/03-2026/1/12 (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arp:srarsr:2026:p:6-22

DOI: 10.32861/sr.121.6.22

Access Statistics for this article

Scientific Review is currently edited by Dr. Abdelazim Mohamed Abdelhamid Negm

More articles in Scientific Review from Academic Research Publishing Group Rahim Yar Khan 64200, Punjab, Pakistan.
Bibliographic data for series maintained by Managing Editor ().