Exploring Metaheuristic Optimized Machine Learning for Software Defect Detection on Natural Language and Classical Datasets

Petrovic, Aleksandar; Jovanovic, Luka; Bacanin, Nebojsa; Antonijevic, Milos; Savanovic, Nikola; Zivkovic, Miodrag; Milovanovic, Marina; Gajic, Vuk

Exploring Metaheuristic Optimized Machine Learning for Software Defect Detection on Natural Language and Classical Datasets

Aleksandar Petrovic, Luka Jovanovic, Nebojsa Bacanin (), Milos Antonijevic, Nikola Savanovic, Miodrag Zivkovic, Marina Milovanovic and Vuk Gajic
Additional contact information
Aleksandar Petrovic: Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
Luka Jovanovic: Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
Nebojsa Bacanin: Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
Milos Antonijevic: Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
Nikola Savanovic: Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
Miodrag Zivkovic: Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
Marina Milovanovic: Faculty of Informatics and Computing, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia
Vuk Gajic: Department of Environment and Sustainable Development, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia

Mathematics, 2024, vol. 12, issue 18, 1-46

Abstract: Software is increasingly vital, with automated systems regulating critical functions. As development demands grow, manual code review becomes more challenging, often making testing more time-consuming than development. A promising approach to improving defect detection at the source code level is the use of artificial intelligence combined with natural language processing (NLP). Source code analysis, leveraging machine-readable instructions, is an effective method for enhancing defect detection and error prevention. This work explores source code analysis through NLP and machine learning, comparing classical and emerging error detection methods. To optimize classifier performance, metaheuristic optimizers are used, and algorithm modifications are introduced to meet the study’s specific needs. The proposed two-tier framework uses a convolutional neural network (CNN) in the first layer to handle large feature spaces, with AdaBoost and XGBoost classifiers in the second layer to improve error identification. Additional experiments using term frequency–inverse document frequency (TF-IDF) encoding in the second layer demonstrate the framework’s versatility. Across five experiments with public datasets, the accuracy of the CNN was 0.768799. The second layer, using AdaBoost and XGBoost, further improved these results to 0.772166 and 0.771044, respectively. Applying NLP techniques yielded exceptional accuracies of 0.979781 and 0.983893 from the AdaBoost and XGBoost optimizers.

Keywords: natural language processing; software error detection; metaheuristic; optimization; XGBoost; AdaBoost; convolutional neural networks; explainable artificial intelligence (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/18/2918/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/18/2918/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:18:p:2918-:d:1481402

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().