Two-Layered Reward Reinforcement Learning in Humanoid Robot Motion Tracking

Xu, Jiahong; Zheng, Zhiwei; Ren, Fangyuan

Two-Layered Reward Reinforcement Learning in Humanoid Robot Motion Tracking

Jiahong Xu (), Zhiwei Zheng and Fangyuan Ren
Additional contact information
Jiahong Xu: Robotics Institute, Ningbo University of Technology, Ningbo 315211, China
Zhiwei Zheng: Robotics Institute, Ningbo University of Technology, Ningbo 315211, China
Fangyuan Ren: Robotics Institute, Ningbo University of Technology, Ningbo 315211, China

Mathematics, 2025, vol. 13, issue 21, 1-23

Abstract: In reinforcement learning (RL), reward function design is critical to the learning efficiency and final performance of agents. However, in complex tasks such as humanoid motion tracking, traditional static weighted reward functions struggle to adapt to shifting learning priorities across training stages, and designing a suitable shaping reward is problematic. To address these challenges, this paper proposes a two-layered reward reinforcement learning framework. The framework decomposes the reward into two layers: an upper-level goal reward that measures task completion, and a lower-level optimizing reward that includes auxiliary objectives such as stability, energy consumption, and motion smoothness. The key innovation lies in the online optimization of the lower-level reward weights via an online meta-heuristic optimization algorithm. This online adaptivity enables goal-conditioned reward shaping, allowing the reward structure to evolve autonomously without requiring expert demonstrations, thereby improving learning robustness and interpretability. The framework is tested on a gymnastic motion tracking problem for the Unitree G1 humanoid robot in the Isaac Gym simulation environment. The experimental results show that, compared to a static reward baseline, the proposed framework achieves 7.58% and 10.30% improvements in upper-body and lower-body link tracking accuracy, respectively. The resulting motions also exhibit better synchronization and reduced latency. The simulation results demonstrate the effectiveness of the framework in promoting efficient exploration, accelerating convergence, and enhancing motion imitation quality.

Keywords: reinforcement learning; reward shaping; nonlinear control; humanoid robot (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2025
References: Add references at CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/13/21/3445/pdf (application/pdf)
https://www.mdpi.com/2227-7390/13/21/3445/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:13:y:2025:i:21:p:3445-:d:1782138

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().