EconPapers    
Economics at your fingertips  
 

An Adaptive Sampling Algorithm for Solving Markov Decision Processes

Hyeong Soo Chang (), Michael C. Fu (), Jiaqiao Hu () and Steven I. Marcus ()
Additional contact information
Hyeong Soo Chang: Department of Computer Science and Engineering, Sogang University, Seoul, Korea
Michael C. Fu: Robert H. Smith School of Business, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742
Jiaqiao Hu: Department of Electrical and Computer Engineering, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742
Steven I. Marcus: Department of Electrical and Computer Engineering, and Institute for Systems Research, University of Maryland, College Park, Maryland 20742

Operations Research, 2005, vol. 53, issue 1, 126-139

Abstract: Based on recent results for multiarmed bandit problems, we propose an adaptive sampling algorithm that approximates the optimal value of a finite-horizon Markov decision process (MDP) with finite state and action spaces. The algorithm adaptively chooses which action to sample as the sampling process proceeds and generates an asymptotically unbiased estimator, whose bias is bounded by a quantity that converges to zero at rate (ln N )/ N , where N is the total number of samples that are used per state sampled in each stage. The worst-case running-time complexity of the algorithm is O (( |A|N ) H ), independent of the size of the state space, where | A | is the size of the action space and H is the horizon length. The algorithm can be used to create an approximate receding horizon control to solve infinite-horizon MDPs. To illustrate the algorithm, computational results are reported on simple examples from inventory control.

Keywords: dynamic; programming/optimal; control:Markov; finite; state (search for similar items in EconPapers)
Date: 2005
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (10)

Downloads: (external link)
http://dx.doi.org/10.1287/opre.1040.0145 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:oropre:v:53:y:2005:i:1:p:126-139

Access Statistics for this article

More articles in Operations Research from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().

 
Page updated 2025-03-19
Handle: RePEc:inm:oropre:v:53:y:2005:i:1:p:126-139