EconPapers    
Economics at your fingertips  
 

Analysis of Performance Measure in Q Learning with UCB Exploration

Weicheng Ye and Dangxing Chen
Additional contact information
Weicheng Ye: Credit Suisse Securities, New York, NY 10010-3698, USA
Dangxing Chen: Zu Chongzhi Center for Mathematics and Computational Sciences, Duke Kunshan University, Kunshan 215316, China

Mathematics, 2022, vol. 10, issue 4, 1-16

Abstract: Compared to model-based Reinforcement Learning (RL) approaches, model-free RL algorithms, such as Q -learning, require less space and are more expressive, since specifying value functions or policies is more flexible than specifying the model for the environment. This makes model-free algorithms more prevalent in modern deep RL. However, model-based methods can more efficiently extract the information from available data. The Upper Confidence Bound (UCB) bandit can improve the exploration bonuses, and hence increase the data efficiency in the Q -learning framework. The cumulative regret of the Q -learning algorithm with an UCB exploration policy in the episodic Markov Decision Process has recently been explored in the underlying environment of finite state-action space. In this paper, we study the regret bound of the Q -learning algorithm with UCB exploration in the scenario of compact state-action metric space. We present an algorithm that adaptively discretizes the continuous state-action space and iteratively updates Q -values. The algorithm is able to efficiently optimize rewards and minimize cumulative regret.

Keywords: reinforcement learning; Q -learning; multi-armed bandit; theory of machine learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/4/575/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/4/575/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:4:p:575-:d:747736

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jmathe:v:10:y:2022:i:4:p:575-:d:747736