EconPapers    
Economics at your fingertips  
 

Nowcasting reported covid-19 hospitalizations using de-identified, aggregated medical insurance claims data

Xueda Shen, Aaron Rumack, Bryan Wilder and Ryan J Tibshirani

PLOS Computational Biology, 2025, vol. 21, issue 2, 1-26

Abstract: We propose, implement, and evaluate a method for nowcasting the daily number of new COVID-19 hospitalizations, at the level of individual US states, based on de-identified, aggregated medical insurance claims data. Our analysis proceeds under a hypothetical scenario in which, during the Delta wave, states only report data on the first day of each month, and on this day, report COVID-19 hospitalization counts for each day in the previous month. In this hypothetical scenario (just as in reality), medical insurance claims data continues to be available daily. At the beginning of each month, we train a regression model, using all data available thus far, to predict hospitalization counts from medical insurance claims. We then use this model to nowcast the (unseen) values of COVID-19 hospitalization counts from medical insurance claims, at each day in the following month. Our analysis uses properly-versioned data, which would have been available in real-time at the time predictions are produced (instead of using data that would have only been available in hindsight). In spite of the difficulties inherent to real-time estimation (e.g., latency and backfill) and the complex dynamics behind COVID-19 hospitalizations themselves, we find altogether that medical insurance claims can be an accurate predictor of hospitalization reports, with mean absolute errors typically around 0.4 hospitalizations per 100,000 people, i.e., proportion of variance explained around 75%. Perhaps more importantly, we find that nowcasts made using medical insurance claims are able to qualitatively capture the dynamics (upswings and downswings) of hospitalization waves, which are key features that inform public health decision-making.Author summary: Daily reported COVID-19 hospitalizations have been a topline indicator throughout the pandemic in the US, and an up-to-date awareness of the load on the hospital system has been a key factor in public health decision-making. However, collecting and maintaining this indicator comes at a high price, as frequent reporting of hospitalizations is itself burdensome on the health system. This is especially true at times when it is needed the most: staff shortages in hospitals tended to coincide with surges in hospitalizations, making reporting even more challenging in peak times. In this paper, we explore the use of auxiliary indicators based on de-identified, aggregated medical insurance claims data, and build relatively simple statistical models to track hospitalizations using these auxiliary indicators, so that reporting may be (hypothetically) reduced in frequency, thereby reducing the burden on hospitals. We find that these models can track reported hospitalizations closely, even in critical times (surges), suggesting that our approach and similar ones may be good candidates for reducing reporting frequency in future public health crises.

Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012717 (text/html)
https://journals.plos.org/ploscompbiol/article/fil ... 12717&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pcbi00:1012717

DOI: 10.1371/journal.pcbi.1012717

Access Statistics for this article

More articles in PLOS Computational Biology from Public Library of Science
Bibliographic data for series maintained by ploscompbiol ().

 
Page updated 2025-05-05
Handle: RePEc:plo:pcbi00:1012717