Unsupervised Voting for Detecting the Algorithmic Solving Strategy in Competitive Programming Solutions
Alexandru Stefan Stoica (),
Daniel Babiceanu,
Marian Cristian Mihaescu and
Traian Rebedea ()
Additional contact information
Alexandru Stefan Stoica: Computer Science & Engineering Department, National University of Science and Technology POLITEHNICA Bucharest, 313 Splaiul Independentei, 060042 Bucharest, Romania
Daniel Babiceanu: Department of Computer Science and Information Technologies, University of Craiova, 13 A.I. Cuza, 200585 Craiova, Romania
Marian Cristian Mihaescu: Department of Computer Science and Information Technologies, University of Craiova, 13 A.I. Cuza, 200585 Craiova, Romania
Traian Rebedea: Computer Science & Engineering Department, National University of Science and Technology POLITEHNICA Bucharest, 313 Splaiul Independentei, 060042 Bucharest, Romania
Mathematics, 2025, vol. 13, issue 22, 1-19
Abstract:
The problem of source-code analysis using machine-learning techniques has gained much attention recently, as several powerful code-embedding methods have been created. Having different embedding methods available for source code has opened the way to tackling many practical problems in source-code analysis. This paper addresses the problem of determining the number of distinct algorithmic strategies that may be found in a set of correct solutions to a competitive programming problem. To achieve this, we employ a novel unsupervised algorithm that uses a multiview interpretation of data based on different embedding and clustering methods, a multidimensional assignment problem (MAP) to determine a subset of a higher probability of correctness, and a self-training method based on voting to determine the correct clusters of the remaining set. We investigate the following two aspects: (1) whether the proposed unsupervised approach outperforms existing methods when the number K of distinct algorithmic strategies is known and (2) Whether the approach can also be applied to determine the optimal value of K . We have addressed these using seven embedding methods with three clustering strategies in a data-analysis pipeline that tackles the previously described issues on a newly created dataset consisting of 15 algorithmic problems. According to the results, for the first aspect, the proposed unsupervised voting algorithm significantly improves the baseline clustering approach for a known K . This improvement was observed across all problems in the dataset, except one. In the case of the second one, we prove that the proposed method has a negative impact on determining the optimal number K . Scale-up of the data-analysis pipeline to datasets of thousands of problems may yield the ability to profoundly understand and learn about the innovative process of correctly designing and writing code in the context of competitive programming or even industry code.
Keywords: deep learning; code embeddings; competitive programming; transformers; unsupervised learning; multi-assignment problem (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/13/22/3589/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/22/3589/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:22:p:3589-:d:1790343
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().