Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant

Demidova, Liliya A.; Andrianova, Elena G.; Sovietov, Peter N.; Gorchakov, Artyom V.

Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant

Liliya A. Demidova (), Elena G. Andrianova (), Peter N. Sovietov () and Artyom V. Gorchakov ()
Additional contact information
Liliya A. Demidova: Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia
Elena G. Andrianova: Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia
Peter N. Sovietov: Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia
Artyom V. Gorchakov: Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA—Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia

Data, 2023, vol. 8, issue 6, 1-16

Abstract: This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students.

Keywords: autograding; programming exercise generation; python; online education; program text analysis; source code analysis; abstract syntax trees; clustering algorithms (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2023
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/2306-5729/8/6/109/pdf (application/pdf)
https://www.mdpi.com/2306-5729/8/6/109/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:8:y:2023:i:6:p:109-:d:1171226

Access Statistics for this article

Data is currently edited by Ms. Becky Zhang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().