Approximation Benefits of Policy Gradient Methods with Aggregated States
Daniel Russo ()
Additional contact information
Daniel Russo: Graduate School of Business, Columbia University, New York, New York 10027
Management Science, 2023, vol. 69, issue 11, 6898-6911
Abstract:
Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, in which the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per period is bounded by ϵ , the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as ϵ / ( 1 − γ ) , where γ is a discount factor. Faced with inherent approximation error, methods that locally optimize the true decision objective can be far more robust.
Keywords: reinforcement learning; approximate dynamic programming; policy gradient methods; state aggregation (search for similar items in EconPapers)
Date: 2023
References: Add references at CitEc
Citations:
Downloads: (external link)
http://dx.doi.org/10.1287/mnsc.2023.4788 (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:inm:ormnsc:v:69:y:2023:i:11:p:6898-6911
Access Statistics for this article
More articles in Management Science from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().