EconPapers    
Economics at your fingertips  
 

An ϵ -Greedy Multiarmed Bandit Approach to Markov Decision Processes

Isa Muqattash and Jiaqiao Hu ()
Additional contact information
Isa Muqattash: Independent Researcher, Stony Brook, NY 11794-3600, USA
Jiaqiao Hu: Department of Applied Mathematics and Statistics, The State University of New York at Stony Brook, Stony Brook, NY 11794-3600, USA

Stats, 2023, vol. 6, issue 1, 1-14

Abstract: We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ -greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 2 10 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ -greedy algorithm that commonly appears in the literature.

Keywords: multiarmed bandits; epsilon-greedy method; Markov decision process (MDP); sampling; optimization under uncertainties (search for similar items in EconPapers)
JEL-codes: C1 C10 C11 C14 C15 C16 (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2571-905X/6/1/6/pdf (application/pdf)
https://www.mdpi.com/2571-905X/6/1/6/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jstats:v:6:y:2023:i:1:p:6-112:d:1022218

Access Statistics for this article

Stats is currently edited by Mrs. Minnie Li

More articles in Stats from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jstats:v:6:y:2023:i:1:p:6-112:d:1022218