Factor-based deep reinforcement learning for asset allocation: Comparative analysis of static and dynamic beta reward designs

Jung, Nak Hyun; Oh, Taeyeon

Factor-based deep reinforcement learning for asset allocation: Comparative analysis of static and dynamic beta reward designs

Nak Hyun Jung and Taeyeon Oh

PLOS ONE, 2025, vol. 20, issue 12, 1-26

Abstract: Traditional asset allocation rules, while effective in stable phases, tend to erode once markets enter volatile regimes or undergo structural breaks. Research in deep reinforcement learning (DRL) has usually emphasized raw-return rewards, leaving aside the role of factor exposures (β) that shape both risk-adjusted payoffs and adaptive responses.This paper advances a Factor-based Deep Reinforcement Learning for Asset Allocation (FDRL) framework in which β sensitivities—estimated via rolling regressions on momentum, volatility, deviation, and volume signals—inform both the state representation and the reward design. Five reward variants are examined (Sharpe, Sortino, Static-β, Dynamic-β, Momentum-β) using PPO, SAC, and TD3 across equities, cryptocurrencies, macroeconomic instruments, and mixed portfolios.Empirically, β-based rewards generate heterogeneous but interpretable patterns. In equities, Dynamic-β improves annualized returns from roughly 20% (Sharpe baseline) to 23–24%, with Sharpe rising from 1.04 to about 1.27 across windows. In cryptocurrencies, Dynamic-/Momentum-β achieve 38–43% annual returns but remain highly regime-sensitive, with drawdowns often exceeding –35%. In macro instruments, Static-β delivers the most stable behaviour, maintaining volatilities near 8–9% and limiting drawdowns to roughly –18%. In mixed-asset portfolios, Momentum-β under TD3 produces the strongest gains (cumulative returns above 70–80%), exceeding equal-weight baselines whose CAGR remains near 19–22% with Sharpe ratios around 1.25.All findings were validated through beta-window sensitivity checks (30/60/90/120 days), regime-conditional analysis, and multiple robustness tests including HAC, Wilcoxon, jackknife Sharpe, moving-block bootstrap, and false-discovery-rate adjustments. These diagnostics confirm that the main performance patterns are not driven by window choice or serial dependence.Four contributions follow. First, a reward structure operationalizing time-varying β. Second, systematic benchmarking of factor-sensitive objectives. Third, evidence on asymmetric outcomes across asset classes. Finally, a framework that reconciles responsiveness with interpretability and risk discipline in allocation.

Date: 2025
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0332779 (text/html)
https://journals.plos.org/plosone/article/file?id= ... 32779&type=printable (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:plo:pone00:0332779

DOI: 10.1371/journal.pone.0332779

Access Statistics for this article

More articles in PLOS ONE from Public Library of Science
Bibliographic data for series maintained by plosone ().