An ϵ -Greedy Multiarmed Bandit Approach to Markov Decision Processes
Isa Muqattash and
Jiaqiao Hu ()
Additional contact information
Isa Muqattash: Independent Researcher, Stony Brook, NY 11794-3600, USA
Jiaqiao Hu: Department of Applied Mathematics and Statistics, The State University of New York at Stony Brook, Stony Brook, NY 11794-3600, USA
Stats, 2023, vol. 6, issue 1, 1-14
Abstract:
We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ -greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 2 10 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ -greedy algorithm that commonly appears in the literature.
Keywords: multiarmed bandits; epsilon-greedy method; Markov decision process (MDP); sampling; optimization under uncertainties (search for similar items in EconPapers)
JEL-codes: C1 C10 C11 C14 C15 C16 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2571-905X/6/1/6/pdf (application/pdf)
https://www.mdpi.com/2571-905X/6/1/6/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jstats:v:6:y:2023:i:1:p:6-112:d:1022218
Access Statistics for this article
Stats is currently edited by Mrs. Minnie Li
More articles in Stats from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().