4.7 Article

Online Synchronous Approximate Optimal Learning Algorithm for Multiplayer Nonzero-Sum Games With Unknown Dynamics

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSMC.2013.2295351

关键词

Adaptive dynamic programming (ADP); approximate dynamic programming; multiplayer nonzero-sum games; neural networks; neuro-dynamic programming; policy iteration

资金

  1. National Natural Science Foundation of China [61034002, 61233001, 61273140, 61304086, 61374105]
  2. Beijing Natural Science Foundation [4132078]
  3. Early Career Development Award of SKLMCCS

向作者/读者索取更多资源

In this paper, we develop an online synchronous approximate optimal learning algorithm based on policy iteration to solve a multiplayer nonzero-sum game without the requirement of exact knowledge of dynamical systems. First, we prove that the online policy iteration algorithm for the nonzero-sum game is mathematically equivalent to the quasi-Newton's iteration in a Banach space. Then, a model neural network is established to identify the unknown continuous-time nonlinear system using input-output data. For each player, a critic neural network and an action neural network are used to approximate its value function and control policy, respectively. Our algorithm only needs to tune the weights of critic neural networks, so there will be less computational complexity during the learning process. All the neural network weights are updated online in real-time, continuously and synchronously. Furthermore, the uniform ultimate bounded stability of the closed-loop system is proved based on Lyapunov approach. Finally, two simulation examples are given to demonstrate the effectiveness of the developed scheme.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

A Novel Value Iteration Scheme With Adjustable Convergence Rate

Mingming Ha, Ding Wang, Derong Liu

Summary: In this article, a novel value iteration scheme is proposed, which introduces a relaxation factor and combines with other methods to accelerate and guarantee the convergence. The theoretical results and numerical examples demonstrate its fast convergence speed and stability.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Automation & Control Systems

Event-triggered robust control for multi-player nonzero-sum games with input constraints and mismatched uncertainties

Shunchao Zhang, Bo Zhao, Derong Liu, Cesare Alippi, Yongwei Zhang

Summary: In this article, an event-triggered robust control (ETRC) method is investigated for multi-player nonzero-sum games of continuous-time input constrained nonlinear systems with mismatched uncertainties. The method transforms the robust control problem into an optimal regulation problem by constructing an auxiliary system and designing an appropriate value function. A critic neural network (NN) is used to approximate the value function of each player and obtain control laws. The method reduces computational burden and communication bandwidth by updating the control laws when events occur. The effectiveness of the developed ETRC method is demonstrated through theoretical analysis and examples.

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL (2023)

Article Computer Science, Artificial Intelligence

Event-triggered adaptive dynamic programming for decentralized tracking control of input constrained unknown nonlinear interconnected systems

Qiuye Wu, Bo Zhao, Derong Liu, Marios M. Polycarpou

Summary: This paper proposes an event-triggered adaptive dynamic programming method to solve the decentralized tracking control problem for input constrained unknown nonlinear interconnected systems. A neural-network-based local observer is established to reconstruct the system dynamics using local input-output data and desired trajectories. The DTC problem is transformed into an optimal control problem using a nonquadratic value function. The DTC policy is obtained by solving the local Hamilton-Jacobi-Bellman equation through the observer-critic architecture, with weights tuned by the experience replay technique. Simulation examples demonstrate the effectiveness of the proposed scheme.

NEURAL NETWORKS (2023)

Article Computer Science, Artificial Intelligence

Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics

Mingduo Lin, Bo Zhao, Derong Liu

Summary: A novel policy gradient (PG) adaptive dynamic programming method is proposed for nonlinear discrete-time zero-sum games with unknown dynamics. A policy iteration algorithm is used to approximate the Q-function and the control and disturbance policies using neural network approximators. The control and disturbance policies are then updated using the PG method based on the iterative Q-function. The experience replay technique is applied to improve training stability and data usage efficiency. Simulation results show the effectiveness of the proposed method.

SOFT COMPUTING (2023)

Article Computer Science, Artificial Intelligence

Neuro-Optimal Event-Triggered Impulsive Control for Stochastic Systems via ADP

Mingming Liang, Derong Liu

Summary: This article presents a novel neural-network-based optimal event-triggered impulsive control method. The proposed method utilizes a general-event-based impulsive transition matrix (GITM) to represent the evolving characteristics of all system states across impulsive actions. Through the developed event-triggered impulsive adaptive dynamic programming (ETIADP) algorithm and its high-efficiency version (HEIADP), the optimization problems for stochastic systems with event-triggered impulsive controls are addressed. The results show that the proposed methods can reduce computational and communication burdens and fulfill the desired goals.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Automation & Control Systems

Deep Learning-Based Trajectory Planning and Control for Autonomous Ground Vehicle Parking Maneuver

Runqi Chai, Derong Liu, Tianhao Liu, Antonios Tsourdos, Yuanqing Xia, Senchun Chai

Summary: This paper presents an integrated real-time trajectory planning and tracking control framework for autonomous ground vehicles (AGV) parking maneuver problems, utilizing deep neural networks and recurrent network structures. Two transfer learning strategies are applied to adapt the motion planner for different AGV types. Experimental studies demonstrate enhanced performance in fulfilling parking missions.

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (2023)

Article Computer Science, Artificial Intelligence

Adaptive dynamic programming-based hierarchical decision-making of non-affine systems

Danyu Lin, Shan Xue, Derong Liu, Mingming Liang, Yonghua Wang

Summary: In this paper, a problem of multiplayer hierarchical decision-making for non-affine systems is solved using adaptive dynamic programming. The control dynamics are obtained and combined with the original system dynamics, transforming the non-affine multiplayer system into a general affine form. The hierarchical decision problem is modeled as a Stackelberg game, and a neural network is used to reconstruct the augmented system and approximate the value function. The feasibility and effectiveness of the algorithm are confirmed through simulation.

NEURAL NETWORKS (2023)

Article Computer Science, Artificial Intelligence

Fault tolerant control for a class of nonlinear systems with multiple faults using neuro-dynamic programming

Chujian Zeng, Bo Zhao, Derong Liu

Summary: This paper proposes a neuro-dynamic programming-based fault tolerant control scheme for a class of nonlinear systems, considering the occurrence of both actuator and sensor faults simultaneously. The scheme combines a descriptor observer with an adaptive observer to estimate system states and multiple faults. By employing a critic neural network, the approximate optimal control policy is obtained for the fault-free system. An FTC law is developed to suppress the influence of actuator faults by combining the estimations of actuator faults with the approximate optimal control policy. The stability of the closed-loop nonlinear system is analyzed using the Lyapunov stability theorem.

NEUROCOMPUTING (2023)

Article Automation & Control Systems

Adaptive Dynamic Programming-Based Event-Triggered Robust Control for Multiplayer Nonzero-Sum Games With Unknown Dynamics

Yongwei Zhang, Bo Zhao, Derong Liu, Shunchao Zhang

Summary: In this article, the event-triggered robust control problem of unknown multiplayer nonlinear systems with constrained inputs and uncertainties is investigated using adaptive dynamic programming. A neural network-based identifier is constructed to relax the requirement of system dynamics. By designing a nonquadratic value function, the stabilization problem is converted into a constrained optimal control problem. The approximate solution of the event-triggered Hamilton-Jacobi equation is obtained using a critic network with a novel weight updating law, and the Lyapunov stability theorem ensures that the multiplayer system is uniformly ultimately bounded.

IEEE TRANSACTIONS ON CYBERNETICS (2023)

Article Automation & Control Systems

An Efficient Impulsive Adaptive Dynamic Programming Algorithm for Stochastic Systems

Mingming Liang, Yonghua Wang, Derong Liu

Summary: In this study, a novel general impulsive transition matrix is defined to reveal the transition dynamics and probability distribution evolution patterns between impulsive events. Based on this matrix, policy iteration-based impulsive adaptive dynamic programming algorithms are developed to solve optimal impulsive control problems. The algorithms demonstrate convergence to the optimal impulsive performance index function and allow for optimization on computing devices with low memory spaces.

IEEE TRANSACTIONS ON CYBERNETICS (2023)

Proceedings Paper Automation & Control Systems

Data-Based Approximate Optimal Control for Unknown Nonaffine Systems via Dynamic Feedback

Jinquan Lin, Bo Zhao, Derong Liu

Summary: In this paper, an integral reinforcement learning (IRL)-based approximate optimal control (AOC) method is developed for unknown nonaffine systems using dynamic feedback. The optimal control policy for nonaffine systems cannot be explicitly expressed due to the unknown input gain matrix. Thus, a dynamic feedback signal is introduced to transform the nonaffine system into an augmented affine system. The AOC for unknown nonaffine systems is formulated by designing an appropriate value function for the augmented affine system, and the IRL method is adopted to derive the approximate solution of the Hamilton-Jacobi-Bellman equation.

2023 IEEE 12TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE, DDCLS (2023)

Article Automation & Control Systems

Event-Triggered Local Control for Nonlinear Interconnected Systems Through Particle Swarm Optimization-Based Adaptive Dynamic Programming

Bo Zhao, Guang Shi, Derong Liu

Summary: This article investigates local control problems for nonlinear interconnected systems by using adaptive dynamic programming (ADP) with particle swarm optimization (PSO). It constructs a proper local value function and employs a local critic neural network to solve the local Hamilton-Jacobi-Bellman equation. The event-triggering mechanism is introduced to determine the sampling time instants and ensure asymptotic stability through Lyapunov stability analysis.

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2023)

Article Automation & Control Systems

Liquid-Updating Impulsive Adaptive Dynamic Programming for Continuous Nonlinear Systems

Mingming Liang, Derong Liu

Summary: This article focuses on designing the optimal impulsive controller (IMC) of continuous-time nonlinear systems and proposes a new adaptive dynamic programming algorithm with high generality and feasibility. The introduced policy-improving mechanism makes the algorithm more flexible for memory-limited computing devices.

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2023)

Article Automation & Control Systems

Safe Reinforcement Learning and Adaptive Optimal Control With Applications to Obstacle Avoidance Problem

Ke Wang, Chaoxu Mu, Zhen Ni, Derong Liu

Summary: This paper presents a novel composite obstacle avoidance control method that generates safe motion trajectories for autonomous systems in an adaptive manner. The method combines model-based policy iteration and state-following-based approximation in a safe reinforcement learning framework. The proposed learning-based controller achieves stable reaching of target points while maintaining a safe distance from obstacles. The effectiveness of the method is demonstrated through simulations and comparisons with other avoidance control methods.

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (2023)

Article Computer Science, Artificial Intelligence

Adaptive Dynamic Programming-Based Cooperative Motion/Force Control for Modular Reconfigurable Manipulators: A Joint Task Assignment Approach

Bo Zhao, Yongwei Zhang, Derong Liu

Summary: This article presents a cooperative motion/force control scheme for modular reconfigurable manipulators (MRMs) based on adaptive dynamic programming (ADP). The dynamic model of the entire MRM system is treated as a set of joint modules interconnected by coupling torque, and the Jacobian matrix is mapped into each joint. A neural network is used as a robust decentralized observer, and an improved local value function is constructed for each joint module. The control scheme is achieved by using force feedback compensation and is proven to be uniformly ultimately bounded through Lyapunov stability analysis.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

暂无数据