Complex Cases of Source Code Authorship Identification Using a Hybrid Deep Neural Network

Kurtukova, Anna; Romanov, Aleksandr; Shelupanov, Alexander; Fedotova, Anastasia

Complex Cases of Source Code Authorship Identification Using a Hybrid Deep Neural Network

Anna Kurtukova (), Aleksandr Romanov, Alexander Shelupanov and Anastasia Fedotova
Additional contact information
Anna Kurtukova: Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia
Aleksandr Romanov: Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia
Alexander Shelupanov: Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia
Anastasia Fedotova: Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia

Future Internet, 2022, vol. 14, issue 10, 1-20

Abstract: This paper is a continuation of our previous work on solving source code authorship identification problems. The analysis of heterogeneous source code is a relevant issue for copyright protection in commercial software development. This is related to the specificity of development processes and the usage of collaborative development tools (version control systems). As a result, there are source codes written according to different programming standards by a team of programmers with different skill levels. Another application field is information security—in particular, identifying the author of computer viruses. We apply our technique based on a hybrid of Inception-v1 and Bidirectional Gated Recurrent Units architectures on heterogeneous source codes and consider the most common commercial development complex cases that negatively affect the authorship identification process. The paper is devoted to the possibilities and limitations of the author’s technique in various complex cases. For situations where a programmer was proficient in two programming languages, the average accuracy was 87%; for proficiency in three or more—76%. For the artificially generated source code case, the average accuracy was 81.5%. Finally, the average accuracy for source codes generated from commits was 84%. The comparison with state-of-the-art approaches showed that the proposed method has no full-functionality analogs covering actual practical cases.

Keywords: authorship; source code; commits; generation; neural network; deep neural network (search for similar items in EconPapers)
JEL-codes: O3 (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/1999-5903/14/10/287/pdf (application/pdf)
https://www.mdpi.com/1999-5903/14/10/287/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jftint:v:14:y:2022:i:10:p:287-:d:930862

Access Statistics for this article

Future Internet is currently edited by Ms. Grace You

More articles in Future Internet from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().