Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies

Candelieri, Antonio; Ponti, Andrea; Fersini, Elisabetta; Messina, Enza; Archetti, Francesco

Safe Optimal Control of Dynamic Systems: Learning from Experts and Safely Exploring New Policies

Antonio Candelieri (), Andrea Ponti, Elisabetta Fersini, Enza Messina and Francesco Archetti
Additional contact information
Antonio Candelieri: Department of Economics Management and Statistics, University of Milano-Bicocca, 20126 Milan, Italy
Andrea Ponti: Department of Economics Management and Statistics, University of Milano-Bicocca, 20126 Milan, Italy
Elisabetta Fersini: Department of Computer Science Systems and Communication, University of Milano-Bicocca, 20126 Milan, Italy
Enza Messina: Department of Computer Science Systems and Communication, University of Milano-Bicocca, 20126 Milan, Italy
Francesco Archetti: Department of Computer Science Systems and Communication, University of Milano-Bicocca, 20126 Milan, Italy

Mathematics, 2023, vol. 11, issue 20, 1-16

Abstract: Many real-life systems are usually controlled through policies replicating experts’ knowledge, typically favouring “safety” at the expense of optimality. Indeed, these control policies are usually aimed at avoiding a system’s disruptions or deviations from a target behaviour, leading to suboptimal performances. This paper proposes a statistical learning approach to exploit the historical safe experience—collected through the application of a safe control policy based on experts’ knowledge— to “safely explore” new and more efficient policies. The basic idea is that performances can be improved by facing a reasonable and quantifiable risk in terms of safety. The proposed approach relies on Gaussian Process regression to obtain a probabilistic model of both a system’s dynamics and performances, depending on the historical safe experience. The new policy consists of solving a constrained optimization problem, with two Gaussian Processes modelling, respectively, the safety constraints and the performance metric (i.e., objective function). As a probabilistic model, Gaussian Process regression provides an estimate of the target variable and the associated uncertainty; this property is crucial for dealing with uncertainty while new policies are safely explored. Another important benefit is that the proposed approach does not require any implementation of an expensive digital twin of the original system. Results on two real-life systems are presented, empirically proving the ability of the approach to improve performances with respect to the initial safe policy without significantly affecting safety.

Keywords: optimal control; safe exploration; Gaussian Processes (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/11/20/4347/pdf (application/pdf)
https://www.mdpi.com/2227-7390/11/20/4347/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:11:y:2023:i:20:p:4347-:d:1263213

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().