4.6 Article

Reinforcement Learning-Based Approximate Optimal Control for Attitude Reorientation Under State Constraints

Journal

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY
Volume 29, Issue 4, Pages 1664-1673

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCST.2020.3007401

Keywords

Attitude control; Payloads; Angular velocity; Optimal control; Artificial neural networks; Cost function; Quaternions; Adaptive dynamic programming (ADP); approximate optimal control; attitude control; reinforcement learning (RL); state constraints

Funding

  1. U.K. Engineering and Physical Sciences Research Council [EP/S001905/1]
  2. EPSRC [EP/S001905/1] Funding Source: UKRI

Ask authors/readers for more resources

This article proposes an approximate optimal control method based on reinforcement learning to address the attitude reorientation problems of rigid bodies under multiple state constraints. The method guarantees constraint handling abilities on attitude forbidden zones and angular velocity limits by encoding constraint information into the cost function using barrier functions, and strictly obeys all underlying state constraints during the online learning process.
This article addresses the attitude reorientation problems of rigid bodies under multiple state constraints. A novel reinforcement learning (RL)-based approximate optimal control method is proposed to make the tradeoff between control cost and performance. The novelty lies in that it guarantees constraint handling abilities on attitude forbidden zones and angular velocity limits. To achieve this, barrier functions are employed to encode the constraint information into the cost function. Then, an RL-based learning strategy is developed to approximate the optimal cost function and control policy. A simplified critic-only neural network (NN) is employed to replace the conventional actor-critic structure once adequate data are collected online. This design guarantees the uniform boundedness of reorientation errors and NN weight estimation errors subject to the satisfaction of a finite excitation condition, which is a relaxation compared with the persistent excitation condition that is typically required for this class of problems. More importantly, all underlying state constraints are strictly obeyed during the online learning process. The effectiveness and advantages of the proposed controller are verified by both numerical simulations and experimental tests based on a comprehensive hardware-in-loop testbed.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available