Minimax weight learning for absorbing MDPs
Fengying Li,
Yuqiang Li () and
Xianyi Wu ()
Additional contact information
Fengying Li: East China Normal University
Yuqiang Li: East China Normal University
Xianyi Wu: East China Normal University
Statistical Papers, 2024, vol. 65, issue 6, No 8, 3545-3582
Abstract:
Abstract Reinforcement learning policy evaluation problems are often modeled as finite or discounted/averaged infinite-horizon Markov Decision Processes (MDPs). In this paper, we study undiscounted off-policy evaluation for absorbing MDPs. Given the dataset consisting of i.i.d episodes under a given truncation level, we propose an algorithm (referred to as MWLA in the text) to directly estimate the expected return via the importance ratio of the state-action occupancy measure. The Mean Square Error (MSE) bound of the MWLA method is provided and the dependence of statistical errors on the data size and the truncation level are analyzed. The performance of the algorithm is illustrated by means of computational experiments under an episodic taxi environment
Keywords: Absorbing MDP; Off-policy; Minimax weight learning; Policy evaluation; Occupancy measure (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://link.springer.com/10.1007/s00362-023-01491-4 Abstract (text/html)
Access to the full text of the articles in this series is restricted.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:stpapr:v:65:y:2024:i:6:d:10.1007_s00362-023-01491-4
Ordering information: This journal article can be ordered from
http://www.springer. ... business/journal/362
DOI: 10.1007/s00362-023-01491-4
Access Statistics for this article
Statistical Papers is currently edited by C. Müller, W. Krämer and W.G. Müller
More articles in Statistical Papers from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().