4.7 Article

Robust Learning-Based Predictive Control for Discrete-Time Nonlinear Systems With Unknown Dynamics and State Constraints

Journal

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
Volume 52, Issue 12, Pages 7314-7327

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMC.2022.3146284

Keywords

Robustness; Predictive control; Aerospace electronics; Computational modeling; Predictive models; Optimal control; Heuristic algorithms; Model predictive control (MPC); nonlinear systems; reinforcement learning (RL); robustness; state constraints

Funding

  1. National Natural Science Foundation of China [61825305, 62003361, U21A20518]
  2. China Postdoctoral Science Foundation [47680]
  3. National Key Research and Development Program of China [2018YFB1305105]

Ask authors/readers for more resources

The article introduces an efficient robust MPC solution based on receding horizon reinforcement learning for unknown nonlinear systems. The method utilizes a prediction model built from precollected datasets and learns near-optimal feedback control policy through an actor-critic structure.
Robust model predictive control (MPC) is a well-known control technique for model-based control with constraints and uncertainties. In classic robust tube-based MPC approaches, an open-loop control sequence is computed via periodically solving an online nominal MPC problem, which requires prior model information and frequent access to onboard computational resources. In this article, we propose an efficient robust MPC solution based on receding horizon reinforcement learning, called r-LPC, for unknown nonlinear systems with state constraints and disturbances. The proposed r-LPC utilizes a Koopman operator-based prediction model obtained offline from precollected input-output datasets. Unlike classic tube-based MPC, in each prediction time interval of r-LPC, we use an actor-critic structure to learn a near-optimal feedback control policy rather than a control sequence. The resulting closed-loop control policy can be learned offline and deployed online or learned online in an asynchronous way. In the latter case, online learning can be activated whenever necessary; for instance, the safety constraint is violated with the deployed policy. The closed-loop recursive feasibility, robustness, and asymptotic stability are proven under function approximation errors of the actor-critic networks. Simulation and experimental results on two nonlinear systems with unknown dynamics and disturbances have demonstrated that our approach has better or comparable performance when compared with tube-based MPC and linear quadratic regulator, and outperforms a recently developed actor-critic learning approach.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available