Quantify unmet medical need across the disease landscape – A large language model-based methodology
Elliott W Sharp,
Nicholas Fragola,
Charlotte Blewitt,
Matthew Goddeeris,
Lee Lancashire,
Charlie Hempstead and
David C Fajgenbaum
PLOS Medicine, 2026, vol. 23, issue 3, 1-17
Abstract:
Background: Despite the ultimate goal of medical researchers and funders being to maximize patient benefit, there is no systematic process for quantifying unmet medical need across diseases. While a relative unmet medical need scoring system would be valuable for prioritization of medical research, systematically performing this effort across all 22,701 human diseases is technically challenging, time-consuming, and expensive. Using a large language model-based (LLM) architecture, we built a scalable method demonstrating feasibility to quantify “unmet medical need” criteria across all diseases, combine those criteria into a single weighted score, and extend the method into new criteria or diseases in the future. We aimed to quantitatively determine which diseases have the greatest unmet medical need and, therefore, which diseases are priority targets for new repurposed treatments. Method and findings: We defined 11 scoring criteria across three categories of unmet medical need. For each criterion, we tested LLM models and refined prompts to generate a score per criteria for each disease and then defined a weighting for each criterion to contribute to a final score. A 30-disease development set was used to iterate on the prompting, and a 10-disease evaluation set was held out and used to evaluate the performance of the final prompt. All 22,701 human diseases in the MONDO disease ontology were quantitatively scored for their unmet medical need across all 11 weighted criteria. The resulting scores allowed for relative comparison between diseases of unmet medical needs. Inter-expert agreement was strong, indicating reliability of the scoring framework with 95% of ratings within a 1-point difference. Across multiple LLMs, gpt-4o is most closely aligned with expert rankings, achieving low mean and standard deviation differences relative to human scores. Furthermore, LLM-generated scores demonstrated strong Spearman’s rho correlations with expert assessments across key clinical criteria, such as mortality (ρ = 0.845) and quality-adjusted life years lost (ρ = 0.822), supporting their suitability for prioritizing unmet medical need. All data were generated in ~1 hour with no missing data, at a total cost of $120 USD of compute and the results of the Unmet Medical Need Index are publicly available. The main limitation of this study is the combined size of the development and evaluation set being 40 diseases. Conclusions: This accessible, scalable methodology enables funders and researchers, across governments, universities, healthcare organizations, and disease groups to tailor prioritization efforts according to unmet medical need in the context of their organizational objectives, by selecting appropriate criteria and weighting of those criteria. This method creates a pragmatic and transparent tool to streamline research prioritization. Future research should consider expanding the disease set size used to create scores. Why was this study done?: What did the researchers do and find?: What do these findings mean?: Elliott W Sharp and colleagues develop Large Language Model-Based Methodology to quantify unmet needs among human diseases considering factors related to patient suffering, standard of care, and accessibility of care.
Date: 2026
References: Add references at CitEc
Citations:
Downloads: (external link)
https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1004798 (text/html)
https://journals.plos.org/plosmedicine/article/fil ... 04798&type=printable (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:plo:pmed00:1004798
DOI: 10.1371/journal.pmed.1004798
Access Statistics for this article
More articles in PLOS Medicine from Public Library of Science
Bibliographic data for series maintained by plosmedicine ().