Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis

doi:10.1109/TCYB.2016.2542923

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 复杂系统智能机理与平行控制团队

	Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis
	Wei, Qinglai1 ; Lewis, Frank L.2,3; Sun, Qiuye 4; Yan, Pengfei1 ; Song, Ruizhuo 5
发表期刊	IEEE TRANSACTIONS ON CYBERNETICS
	2017-05-01
卷号	47 期号:5 页码:1224-1237
文章类型	Article
摘要	In this paper, a novel discrete-time deterministic Q-learning algorithm is developed. In each iteration of the developed Q-learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q-learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q-learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
关键词	Adaptive Critic Designs Adaptive Dynamic Programming (Adp) Approximate Dynamic Programming Neural Networks (Nns) Neuro-dynamic Programming Optimal Control Q-learning
WOS标题词	Science & Technology ; Technology
DOI	10.1109/TCYB.2016.2542923
关键词[WOS]	OPTIMAL TRACKING CONTROL ; ZERO-SUM GAMES ; H-INFINITY CONTROL ; INPUT-OUTPUT DATA ; DEAD-ZONE INPUT ; NONLINEAR-SYSTEMS ; ALGORITHM ; DESIGN ; REPRESENTATION ; APPROXIMATION
收录类别	SCI
语种	英语
项目资助者	National Natural Science Foundation (NNSF) of China(61374105 ; Fundamental Research Funds for the Central Universities(FRF-TP-15-056A3) ; Open Research Project from SKLMCCS(20150104) ; National Science Foundation(ECCS-1405173 ; Office of Naval Research, Arlington, VA, USA(N00014-13-1-0562 ; U.S. Army Research Office(W911NF-11-D-0001) ; China NNSF(61120106011) ; China Education Ministry Project 111(B08015) ; 61304079 ; IIS-1208623) ; N000141410718) ; 61273140)
WOS研究方向	Computer Science
WOS类目	Computer Science, Artificial Intelligence ; Computer Science, Cybernetics
WOS记录号	WOS:000399797000009
引用统计	被引频次：150[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://ir.ia.ac.cn/handle/173211/13630
专题	多模态人工智能系统全国重点实验室_复杂系统智能机理与平行控制团队
作者单位	1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China 2.Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA 3.Northeastern Univ, Shenyang 110036, Peoples R China 4.Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110036, Peoples R China 5.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	Wei, Qinglai,Lewis, Frank L.,Sun, Qiuye,et al. Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis[J]. IEEE TRANSACTIONS ON CYBERNETICS,2017,47(5):1224-1237.
APA	Wei, Qinglai,Lewis, Frank L.,Sun, Qiuye,Yan, Pengfei,&Song, Ruizhuo.(2017).Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis.IEEE TRANSACTIONS ON CYBERNETICS,47(5),1224-1237.
MLA	Wei, Qinglai,et al."Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis".IEEE TRANSACTIONS ON CYBERNETICS 47.5(2017):1224-1237.