Transformer-based models application for bug detection in source code
Illia Vokhranov () and
Bogdan Bulakh
Additional contact information
Illia Vokhranov: National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»
Bogdan Bulakh: National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»
Technology audit and production reserves, 2024, vol. 5, issue 2(79), 6-15
Abstract:
This paper explores the use of transformer-based models for bug detection in source code, aiming to better understand the capacity of these models to learn complex patterns and relationships within the code. Traditional static analysis tools are highly limited in their ability to detect semantic errors, resulting in numerous defects passing through to the code execution stage. This research represents a step towards enhancing static code analysis using neural networks.The experiments were designed as binary classification tasks to detect buggy code snippets, each targeting a specific defect type such as NameError, TypeError, IndexError, AttributeError, ValueError, EOFError, SyntaxError, and ModuleNotFoundError. Utilizing the «RunBugRun» dataset, which relies on code execution results, the models – BERT, CodeBERT, GPT-2, and CodeT5 – were fine-tuned and compared under identical conditions and hyperparameters. Performance was evaluated using F1-Score, Precision, and Recall.The results indicated that transformer-based models, especially CodeT5 and CodeBERT, were effective in identifying various defects, demonstrating their ability to learn complex code patterns. However, performance varied by defect type, with some defects like IndexError and TypeError being more challenging to detect. The outcomes underscore the importance of high-quality, diverse training data and highlight the potential of transformer-based models to achieve more accurate early defect detection.Future research should further explore advanced transformer architectures for detecting complicated defects, and investigate the integration of additional contextual information to the detection process. This study highlights the potential of modern machine learning architectures to advance software engineering practices, leading to more efficient and reliable software development.
Keywords: transformers; large language models; bug detection; defect detection; static code analysis; neural networks (search for similar items in EconPapers)
Date: 2024
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://journals.uran.ua/tarp/article/download/310822/302262 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:baq:taprar:v:5:y:2024:i:2:p:6-15
DOI: 10.15587/2706-5448.2024.310822
Access Statistics for this article
More articles in Technology audit and production reserves from PC TECHNOLOGY CENTER
Bibliographic data for series maintained by Iryna Prudius ().