Parallel Nonstationary Direct Policy Search for Risk-Averse Stochastic Optimization

Moazeni, Somayeh; Powell, Warren B.; Defourny, Boris; Bouzaiene-Ayari, Belgacem

Parallel Nonstationary Direct Policy Search for Risk-Averse Stochastic Optimization

Somayeh Moazeni (), Warren B. Powell (), Boris Defourny () and Belgacem Bouzaiene-Ayari ()
Additional contact information
Somayeh Moazeni: School of Systems and Enterprises, Stevens Institute of Technology, Hoboken, New Jersey 07030
Warren B. Powell: Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544
Boris Defourny: Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, Pennsylvania 18015
Belgacem Bouzaiene-Ayari: Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544

INFORMS Journal on Computing, 2017, vol. 29, issue 2, 332-349

Abstract: This paper presents an algorithmic strategy to nonstationary policy search for finite-horizon, discrete-time Markovian decision problems with large state spaces, constrained action sets, and a risk-sensitive optimality criterion. The methodology relies on modeling time-variant policy parameters by a nonparametric response surface model for an indirect parametrized policy motivated by Bellman’s equation. The policy structure is heuristic when the optimization of the risk-sensitive criterion does not admit a dynamic programming reformulation. Through the interpolating approximation, the level of nonstationarity of the policy, and consequently, the size of the resulting search problem can be adjusted. The computational tractability and the generality of the approach follow from a nested parallel implementation of derivative-free optimization in conjunction with Monte Carlo simulation. We demonstrate the efficiency of the approach on an optimal energy storage charging problem, and illustrate the effect of the risk functional on the improvement achieved by allowing a higher complexity in time variation for the policy.

Keywords: dynamic optimization; risk-averse stochastic optimization; parallel optimization; derivative-free optimization; direct policy search; learning; energy storage (search for similar items in EconPapers)
Date: 2017
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (1)

Downloads: (external link)
https://doi.org/10.1287/ijoc.2016.0733 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:inm:orijoc:v:29:y:2017:i:2:p:332-349

Access Statistics for this article

More articles in INFORMS Journal on Computing from INFORMS Contact information at EDIRC.
Bibliographic data for series maintained by Chris Asher ().