Assessing ChatGPT's ability to detect and correct programming errors in stata do-files
Ricardo Mora
UC3M Working papers. Economics from Universidad Carlos III de Madrid. Departamento de EconomÃa
Abstract:
This paper evaluates the performance of successive ChatGPT models in debugging proprietary software Stata using a fully controlled experimental design implemented through the API. The API setting ensures independence across runs and eliminates interface effects. The experiment comprises 75 do-les distributed across three diculty levels, containing three types of errors, evaluated with three model generations and two information conditions (close-book and open-book), yielding a structured panel of model responses and a database of 7,425 observations. Performance is assessed using six primary metrics-solution validity, hallucination, diagnosis, explanation, illusory expertise, and speculative reasoning- and an additional derived indicator capturing the absence of overconfidence. The results show substantial improvements across model generations, with clear diminishing returns. The evidence indicates that progress is mainly driven by broad gains in general reasoning ability rather than by improved self-awareness or enhanced use of contextual information.
Keywords: LLMs; Programming; Econometrics; software; Proprietary; software (search for similar items in EconPapers)
Date: 2025-02-13
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://e-archivo.uc3m.es/rest/api/core/bitstreams ... 4d5ed1d248b6/content (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:cte:werepe:45949
Access Statistics for this paper
More papers in UC3M Working papers. Economics from Universidad Carlos III de Madrid. Departamento de EconomÃa
Bibliographic data for series maintained by Ana Poveda ().