A Self-Rewarding Mechanism in Deep Reinforcement Learning for Trading Strategy Optimization

Huang, Yuling; Zhou, Chujin; Zhang, Lin; Lu, Xiaoping

A Self-Rewarding Mechanism in Deep Reinforcement Learning for Trading Strategy Optimization

Yuling Huang, Chujin Zhou, Lin Zhang and Xiaoping Lu ()
Additional contact information
Yuling Huang: School of Computer Science and Software, Zhaoqing University, Zhaoqing 526060, China
Chujin Zhou: School of Computer Science and Engineering, Macau University of Science and Technology, Taipa 999078, Macao, China
Lin Zhang: School of Accounting and Finance, Beijing Institute of Technology, Beijing 100811, China
Xiaoping Lu: School of Computer Science and Engineering, Macau University of Science and Technology, Taipa 999078, Macao, China

Mathematics, 2024, vol. 12, issue 24, 1-25

Abstract: Reinforcement Learning (RL) is increasingly being applied to complex decision-making tasks such as financial trading. However, designing effective reward functions remains a significant challenge. Traditional static reward functions often fail to adapt to dynamic environments, leading to inefficiencies in learning. This paper presents a novel approach, called Self-Rewarding Deep Reinforcement Learning (SRDRL), which integrates a self-rewarding network within the RL framework. The SRDRL mechanism operates in two primary phases: First, supervised learning techniques are used to learn from expert knowledge by employing advanced time-series feature extraction models, including TimesNet and WFTNet. This step refines the self-rewarding network parameters by comparing predicted rewards with expert-labeled rewards, which are based on metrics such as Min-Max, Sharpe Ratio, and Return. In the second phase, the model selects the higher value between the expert-labeled and predicted rewards as the RL reward, storing it in the replay buffer. This combination of expert knowledge and predicted rewards enhances the performance of trading strategies. The proposed implementation, called Self-Rewarding Double DQN (SRDDQN), demonstrates that the self-rewarding mechanism improves learning and optimizes trading decisions. Experiments conducted on datasets including DJI, IXIC, and SP500 show that SRDDQN achieves a cumulative return of 1124.23% on the IXIC dataset, significantly outperforming the next best method, Fire (DQN-HER), which achieved 51.87%. SRDDQN also enhances the stability and efficiency of trading strategies, providing notable improvements over traditional RL methods. The integration of a self-rewarding mechanism within RL addresses a critical limitation in reward function design and offers a scalable, adaptable solution for complex, dynamic trading environments.

Keywords: deep reinforcement learning; self-rewarding mechanism; human alignment; trading strategy (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2024
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://www.mdpi.com/2227-7390/12/24/4020/pdf (application/pdf)
https://www.mdpi.com/2227-7390/12/24/4020/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:12:y:2024:i:24:p:4020-:d:1549683

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().