Complex Cases of Source Code Authorship Identification Using a Hybrid Deep Neural Network
Anna Kurtukova (),
Aleksandr Romanov,
Alexander Shelupanov and
Anastasia Fedotova
Additional contact information
Anna Kurtukova: Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia
Aleksandr Romanov: Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia
Alexander Shelupanov: Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia
Anastasia Fedotova: Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia
Future Internet, 2022, vol. 14, issue 10, 1-20
Abstract:
This paper is a continuation of our previous work on solving source code authorship identification problems. The analysis of heterogeneous source code is a relevant issue for copyright protection in commercial software development. This is related to the specificity of development processes and the usage of collaborative development tools (version control systems). As a result, there are source codes written according to different programming standards by a team of programmers with different skill levels. Another application field is information security—in particular, identifying the author of computer viruses. We apply our technique based on a hybrid of Inception-v1 and Bidirectional Gated Recurrent Units architectures on heterogeneous source codes and consider the most common commercial development complex cases that negatively affect the authorship identification process. The paper is devoted to the possibilities and limitations of the author’s technique in various complex cases. For situations where a programmer was proficient in two programming languages, the average accuracy was 87%; for proficiency in three or more—76%. For the artificially generated source code case, the average accuracy was 81.5%. Finally, the average accuracy for source codes generated from commits was 84%. The comparison with state-of-the-art approaches showed that the proposed method has no full-functionality analogs covering actual practical cases.
Keywords: authorship; source code; commits; generation; neural network; deep neural network (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/1999-5903/14/10/287/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/10/287/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:10:p:287-:d:930862
Access Statistics for this article
Future Internet is currently edited by Ms. Grace You
More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().