EconPapers    
Economics at your fingertips  
 

Moving beyond the empty cell: The threat of decontextualized healthcare data

Aya El Mir, Eric Bezerra de Sousa, Ignacio Mesina-Estarrón, Leo Anthony Celi, Moad Hani, Mohammed Benjelloun, Neha Nageswaran, Saïd Mahmoudi, Shaheen Siddiqui, Sreeram Sadasivam and William Greig Mitchell

PLOS Digital Health, 2026, vol. 5, issue 1, 1-9

Abstract: Missing, inaccurate, or poorly documented data in healthcare is often treated as a technical problem to be statistically resolved via imputation, deletion, or modeling assumptions about randomness. However, such inaccuracies relate to far more complex socioeconomic and geopolitical issues, rather than “errors of data entry” to be ameliorated with statistical modeling techniques. We outline that what is really missing or inaccurate is the context in which the data is collected—and that only by understanding this context can we begin to prevent artificial intelligence’s (AIs) amplification of misleading, decontextualized data. We critically examine how traditional modeling methods fail to account for the factors that influence what data gets recorded, and for whom. We show how AI systems trained on decontextualized data reinforce health inequities at scale. And, we review recent literature on context-aware approaches to understanding data, that incorporate metadata, social determinants of health, fairness constraints, and participatory governance to build more ethical and representative systems. Our analysis urges the AI and healthcare communities to move beyond the traditional emphasis on statistical convenience, toward socially grounded and interdisciplinary strategies for handling decontextualized data.Author summary: Healthcare data that is missing, incomplete, or inaccurately documented is often treated as a technical problem to be solved with statistical methods. We emphasize that this perspective overlooks the real issue: the data has been stripped of its context. Missing, incomplete, or inaccurate data (collectively termed decontextualized data) is not random; it is shaped by human decisions, social barriers, and systemic inequalities. Decontextualized healthcare data becomes increasingly dangerous as the use of AI in healthcare proliferates. Models trained on decontextualized data learn existing distortions as if objective truths. Consequently, their predictions risk reinforcing the very inequities that caused the flawed data in the first place and exacerbating health disparities at scale. We argue for a paradigm shift towards understanding why data becomes decontextualized. This requires a concerted effort between machine learning communities and domain experts who understand data context. It is only through this partnership that we can begin to build models that account for the complex realities embedded in decontextualized healthcare data that cannot be solved by sophisticated modeling techniques alone.

Date: 2026
References: Add references at CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0001194 (text/html)
https://journals.plos.org/digitalhealth/article/fi ... 01194&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pdig00:0001194

DOI: 10.1371/journal.pdig.0001194

Access Statistics for this article

More articles in PLOS Digital Health from Public Library of Science
Bibliographic data for series maintained by digitalhealth ().

 
Page updated 2026-01-18
Handle: RePEc:plo:pdig00:0001194