EconPapers    
Economics at your fingertips  
 

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

Andrew Bennett () and Nathan Kallus ()
Additional contact information
Andrew Bennett: Cornell Tech, Cornell University, New York, New York 10044
Nathan Kallus: Cornell Tech, Cornell University, New York, New York 10044

Operations Research, 2024, vol. 72, issue 3, 1071-1086

Abstract: In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates derived under the assumption of a perfect Markov decision process (MDP) model. Here we tackle this by considering off-policy evaluation in a partially observed MDP (POMDP). Specifically, we consider estimating the value of a given target policy in an unknown POMDP given observations of trajectories with only partial state observations and generated by a different and unknown policy that may depend on the unobserved state. We tackle two questions: what conditions allow us to identify the target policy value from the observed data and, given identification, how to best estimate it. To answer these, we extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible by the existence of so-called bridge functions. We term the resulting framework proximal reinforcement learning (PRL). We then show how to construct estimators in these settings and prove they are semiparametrically efficient. We demonstrate the benefits of PRL in an extensive simulation study and on the problem of sepsis management.

Keywords: Machine Learning and Data Science; offline reinforcement learning; unmeasured confounding; semiparametric efficiency (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

Downloads: (external link)
http://dx.doi.org/10.1287/opre.2021.0781 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:72:y:2024:i:3:p:1071-1086

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:72:y:2024:i:3:p:1071-1086