☆ 4.6 Article

Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis

IEEE TRANSACTIONS ON CYBERNETICS (2017)

期刊

IEEE TRANSACTIONS ON CYBERNETICS

卷 47, 期 5, 页码 1224-1237

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCYB.2016.2542923

关键词

Adaptive critic designs; adaptive dynamic programming (ADP); approximate dynamic programming; neural networks (NNs); neuro-dynamic programming; optimal control; Q-learning

类别

Automation & Control Systems Computer Science, Artificial Intelligence Computer Science, Cybernetics

资金

National Natural Science Foundation (NNSF) of China [61374105, 61304079, 61273140]
Fundamental Research Funds for the Central Universities [FRF-TP-15-056A3]
Open Research Project from SKLMCCS [20150104]
National Science Foundation [ECCS-1405173, IIS-1208623]
Office of Naval Research, Arlington, VA, USA [N00014-13-1-0562, N000141410718]
U.S. Army Research Office [W911NF-11-D-0001]
China NNSF [61120106011]
China Education Ministry Project 111 [B08015]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, a novel discrete-time deterministic Q-learning algorithm is developed. In each iteration of the developed Q-learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q-learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q-learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.

Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文