Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models

Reason, Tim; Rawlinson, William; Langham, Julia; Gimblett, Andy; Malcolm, Bill; Klijn, Sven

Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models

Tim Reason (), William Rawlinson, Julia Langham, Andy Gimblett, Bill Malcolm and Sven Klijn
Additional contact information
Tim Reason: Estima Scientific
William Rawlinson: Estima Scientific
Julia Langham: Estima Scientific
Andy Gimblett: Estima Scientific
Bill Malcolm: Bristol Myers Squibb
Sven Klijn: Bristol Myers Squibb

PharmacoEconomics - Open, 2024, vol. 8, issue 2, No 3, 203 pages

Abstract: Abstract Background Current generation large language models (LLMs) such as Generative Pre-Trained Transformer 4 (GPT-4) have achieved human-level performance on many tasks including the generation of computer code based on textual input. This study aimed to assess whether GPT-4 could be used to automatically programme two published health economic analyses. Methods The two analyses were partitioned survival models evaluating interventions in non-small cell lung cancer (NSCLC) and renal cell carcinoma (RCC). We developed prompts which instructed GPT-4 to programme the NSCLC and RCC models in R, and which provided descriptions of each model’s methods, assumptions and parameter values. The results of the generated scripts were compared to the published values from the original, human-programmed models. The models were replicated 15 times to capture variability in GPT-4’s output. Results GPT-4 fully replicated the NSCLC model with high accuracy: 100% (15/15) of the artificial intelligence (AI)-generated NSCLC models were error-free or contained a single minor error, and 93% (14/15) were completely error-free. GPT-4 closely replicated the RCC model, although human intervention was required to simplify an element of the model design (one of the model’s fifteen input calculations) because it used too many sequential steps to be implemented in a single prompt. With this simplification, 87% (13/15) of the AI-generated RCC models were error-free or contained a single minor error, and 60% (9/15) were completely error-free. Error-free model scripts replicated the published incremental cost-effectiveness ratios to within 1%. Conclusion This study provides a promising indication that GPT-4 can have practical applications in the automation of health economic model construction. Potential benefits include accelerated model development timelines and reduced costs of development. Further research is necessary to explore the generalisability of LLM-based automation across a larger sample of models.

Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1007/s41669-024-00477-8 Abstract (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:pharmo:v:8:y:2024:i:2:d:10.1007_s41669-024-00477-8

Ordering information: This journal article can be ordered from
http://www.springer.com/adis/journal/41669

DOI: 10.1007/s41669-024-00477-8

Access Statistics for this article

PharmacoEconomics - Open is currently edited by Timothy Wrightson and Christopher Carswell

More articles in PharmacoEconomics - Open from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().