A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems

doi:10.1007/s11432-015-5462-z

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 复杂系统智能机理与平行控制团队

	A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
	Wei QingLai1 ; Liu DeRong 2; Derong Liu
发表期刊	SCIENCE CHINA-INFORMATION SCIENCES
	2015-12-01
卷号	58 期号:12 页码:122203:1–122203:15
文章类型	Article
摘要	In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Q-learning algorithm", is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming (ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are employed to implement the policy iteration based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.
关键词	Adaptive Critic Designs Adaptive Dynamic Programming Approximate Dynamic Programming Q-learning Policy Iteration Neural Networks Nonlinear Systems Optimal Control
WOS标题词	Science & Technology ; Technology
DOI	10.1007/s11432-015-5462-z
关键词[WOS]	OPTIMAL TRACKING CONTROL ; DYNAMIC-PROGRAMMING ALGORITHM ; CONTROL SCHEME ; APPROXIMATION ERRORS ; REINFORCEMENT
收录类别	SCI
语种	英语
项目资助者	National Natural Science Foundation of China(61374105 ; Beijing Natural Science Foundation(4132078) ; 61233001 ; 61273140)
WOS研究方向	Computer Science
WOS类目	Computer Science, Information Systems
WOS记录号	WOS:000368790400015
引用统计	被引频次：43[WOS] [WOS记录] [WOS相关记录]
文献类型	期刊论文
条目标识符	http://ir.ia.ac.cn/handle/173211/10670
专题	多模态人工智能系统全国重点实验室_复杂系统智能机理与平行控制团队
通讯作者	Derong Liu
作者单位	1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China 2.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	Wei QingLai,Liu DeRong,Derong Liu. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems[J]. SCIENCE CHINA-INFORMATION SCIENCES,2015,58(12):122203:1–122203:15.
APA	Wei QingLai,Liu DeRong,&Derong Liu.(2015).A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems.SCIENCE CHINA-INFORMATION SCIENCES,58(12),122203:1–122203:15.
MLA	Wei QingLai,et al."A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems".SCIENCE CHINA-INFORMATION SCIENCES 58.12(2015):122203:1–122203:15.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
2015_SCIS_A novel po（1215KB）	期刊论文	作者接受稿	开放获取	CC BY-NC-SA	浏览下载