Reinforcement learning for policymaking in epidemic control: A scoping review

Bolshov, Oleksandr; Chumachenko, Dmytro

Reinforcement learning for policymaking in epidemic control: A scoping review

Oleksandr Bolshov and Dmytro Chumachenko

PLOS ONE, 2026, vol. 21, issue 6, 1-21

Abstract: Background: Managing an epidemic demands policies that respond at the pace of the outbreak. Conventional rule‑based interventions struggle to keep up, prompting interest in reinforcement learning (RL) for designing non‑pharmaceutical interventions (NPIs). However, current evidence is fragmented across diverse models and reporting styles. Objectives: To systematically map how RL is applied for epidemic NPI design, describe modeling choices, algorithm architectures, evaluation practices, and identify trends and research gaps. Methods: Peer-reviewed studies (2014–2025, English) that applied deep RL to select NPIs were retrieved from IEEE Xplore, ACM Digital Library, ScienceDirect, and Scopus, searched on December 23, 2025. Reference list scanning supplemented database results. Predefined data items (bibliographic details, epidemic and RL model characteristics, experiments, validation methods, outcomes) were charted and summarized descriptively. Results: Of 512 retrieved records, 10 met the inclusion criteria, and three additional studies were identified via reference-list scanning, yielding 13. Five employed value‑based methods, four policy‑gradient, and four hybrid; one study additionally incorporated model-based planning. Six simulations relied on compartmental models, six on agent‑based models, and one on a hybrid model. Action spaces were predominantly discrete restriction levels. Five studies incorporated sequence-modeling techniques to include temporal context into a state space. Eleven studies designed reward functions as a trade-off between pandemic severity and socio-economic cost. According to the reviewed studies, RL policies across various settings outperform heuristic, rule-based, and historical baselines in reducing infections, deaths, or lockdown duration while limiting economic loss. Conclusions: RL shows promise for adaptive epidemic control. Comparison is hampered by simplified economic costs, inconsistent calibration rigor, varied evaluation metrics, and limited uncertainty or policy robustness analysis. Future work should establish common benchmark environments and reporting standards, incorporate empirically grounded economic and behavioral models, adopt uncertainty-aware and probabilistic RL, develop more sophisticated control spaces, investigate more advanced algorithms, and validate learned policies prospectively to enable real-world deployment.

Date: 2026
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0351176 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 51176&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0351176

DOI: 10.1371/journal.pone.0351176

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().