Inverse Markov decision processes with unknown transition probabilities
Zahra Ghatrani and
Archis Ghate
IISE Transactions, 2023, vol. 55, issue 6, 588-601
Abstract:
Inverse optimization involves recovering parameters of a mathematical model using observed values of decision variables. In Markov Decision Processes (MDPs), it has been applied to estimate rewards that render observed policies optimal. A counterpart is not available for transition probabilities. We study two variants of this problem. First, the decision-maker wonders whether there exist a policy and transition probabilities that attain given target values of expected total discounted rewards over an infinite horizon. We derive necessary and sufficient existence conditions, and formulate a feasibility linear program whose solution yields the requisite policy and transition probabilities. We extend these results when the decision-maker wants to render the target values optimal. In the second variant, the decision-maker wishes to find transition probabilities that make a given policy optimal. The resulting problem is nonconvex bilinear, and we propose tailored versions of two heuristics called Convex-Concave Procedure and Sequential Linear Programming (SLP). Their performance is compared via numerical experiments against an exact method. Computational experiments on randomly generated MDPs reveal that SLP outperforms the other two both in runtime and objective values. Further insights into SLP’s performance are derived via numerical experiments on inverse inventory control, equipment replacement, and multi-armed bandit problems.
Date: 2023
References: Add references at CitEc
Citations:
Downloads: (external link)
http://hdl.handle.net/10.1080/24725854.2022.2103755 (text/html)
Access to full text is restricted to subscribers.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:taf:uiiexx:v:55:y:2023:i:6:p:588-601
Ordering information: This journal article can be ordered from
http://www.tandfonline.com/pricing/journal/uiie20
DOI: 10.1080/24725854.2022.2103755
Access Statistics for this article
IISE Transactions is currently edited by Jianjun Shi
More articles in IISE Transactions from Taylor & Francis Journals
Bibliographic data for series maintained by Chris Longhurst ().