Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

Wang, Haoran; Zhou, Xun Yu

Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

Haoran Wang and Xun Yu Zhou

Abstract: We approach the continuous-time mean-variance (MV) portfolio selection with reinforcement learning (RL). The problem is to achieve the best tradeoff between exploration and exploitation, and is formulated as an entropy-regularized, relaxed stochastic control problem. We prove that the optimal feedback policy for this problem must be Gaussian, with time-decaying variance. We then establish connections between the entropy-regularized MV and the classical MV, including the solvability equivalence and the convergence as exploration weighting parameter decays to zero. Finally, we prove a policy improvement theorem, based on which we devise an implementable RL algorithm. We find that our algorithm outperforms both an adaptive control based method and a deep neural networks based algorithm by a large margin in our simulations.

Date: 2019-04, Revised 2019-05
New Economics Papers: this item is included in nep-big and nep-cmp
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (8)

Downloads: (external link)
http://arxiv.org/pdf/1904.11392 Latest version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:1904.11392

Access Statistics for this paper

More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().