Analysis of Performance Measure in Q Learning with UCB Exploration
Weicheng Ye and
Dangxing Chen
Additional contact information
Weicheng Ye: Credit Suisse Securities, New York, NY 10010-3698, USA
Dangxing Chen: Zu Chongzhi Center for Mathematics and Computational Sciences, Duke Kunshan University, Kunshan 215316, China
Mathematics, 2022, vol. 10, issue 4, 1-16
Abstract:
Compared to model-based Reinforcement Learning (RL) approaches, model-free RL algorithms, such as Q -learning, require less space and are more expressive, since specifying value functions or policies is more flexible than specifying the model for the environment. This makes model-free algorithms more prevalent in modern deep RL. However, model-based methods can more efficiently extract the information from available data. The Upper Confidence Bound (UCB) bandit can improve the exploration bonuses, and hence increase the data efficiency in the Q -learning framework. The cumulative regret of the Q -learning algorithm with an UCB exploration policy in the episodic Markov Decision Process has recently been explored in the underlying environment of finite state-action space. In this paper, we study the regret bound of the Q -learning algorithm with UCB exploration in the scenario of compact state-action metric space. We present an algorithm that adaptively discretizes the continuous state-action space and iteratively updates Q -values. The algorithm is able to efficiently optimize rewards and minimize cumulative regret.
Keywords: reinforcement learning; Q -learning; multi-armed bandit; theory of machine learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:
Downloads: (external link)
https://www.mdpi.com/2227-7390/10/4/575/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/4/575/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:4:p:575-:d:747736
Access Statistics for this article
Mathematics is currently edited by Ms. Emma He
More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().