Are Source Code Metrics “Good Enough” in Predicting Security Vulnerabilities?

Ganesh, Sundarakrishnan; Palma, Francis; Olsson, Tobias

Are Source Code Metrics “Good Enough” in Predicting Security Vulnerabilities?

Sundarakrishnan Ganesh (), Francis Palma () and Tobias Olsson ()
Additional contact information
Sundarakrishnan Ganesh: Department of Computer Science and Media Technology, Linnaeus University, 351 95 Växjö, Sweden
Francis Palma: Department of Computer Science and Media Technology, Linnaeus University, 351 95 Växjö, Sweden
Tobias Olsson: Department of Computer Science and Media Technology, Linnaeus University, 351 95 Växjö, Sweden

Data, 2022, vol. 7, issue 9, 1-38

Abstract: Modern systems produce and handle a large volume of sensitive enterprise data. Therefore, security vulnerabilities in the software systems must be identified and resolved early to prevent security breaches and failures. Predicting security vulnerabilities is an alternative to identifying them as developers write code. In this study, we studied the ability of several machine learning algorithms to predict security vulnerabilities. We created two datasets containing security vulnerability information from two open-source systems: (1) Apache Tomcat (versions 4.x and five 2.5.x minor versions). We also computed source code metrics for these versions of both systems. We examined four classifiers, including Naive Bayes, Decision Tree, XGBoost Classifier, and Logistic Regression, to show their ability to predict security vulnerabilities. Moreover, an ensemble learner was introduced using a stacking classifier to see whether the prediction performance could be improved. We performed cross-version and cross-project predictions to assess the effectiveness of the best-performing model. Our results showed that the XGBoost classifier performed best compared to other learners, i.e., with an average accuracy of 97% in both datasets. The stacking classifier performed with an average accuracy of 92% in Struts and 71% in Tomcat. Our best-performing model—XGBoost—could predict with an average accuracy of 87% in Tomcat and 99% in Struts in a cross-version setup.

Keywords: prediction; security vulnerabilities; machine learning; source code; software metrics (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/7/9/127/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/9/127/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:9:p:127-:d:908972

Access Statistics for this article

Data is currently edited by Ms. Becky Zhang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().