A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems
Wei QingLai1; Liu DeRong2; Derong Liu
发表期刊SCIENCE CHINA-INFORMATION SCIENCES
2015-12-01
卷号58期号:12页码:122203:1–122203:15
文章类型Article
摘要In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Q-learning algorithm", is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamic programming (ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are employed to implement the policy iteration based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.
关键词Adaptive Critic Designs Adaptive Dynamic Programming Approximate Dynamic Programming Q-learning Policy Iteration Neural Networks Nonlinear Systems Optimal Control
WOS标题词Science & Technology ; Technology
DOI10.1007/s11432-015-5462-z
关键词[WOS]OPTIMAL TRACKING CONTROL ; DYNAMIC-PROGRAMMING ALGORITHM ; CONTROL SCHEME ; APPROXIMATION ERRORS ; REINFORCEMENT
收录类别SCI
语种英语
项目资助者National Natural Science Foundation of China(61374105 ; Beijing Natural Science Foundation(4132078) ; 61233001 ; 61273140)
WOS研究方向Computer Science
WOS类目Computer Science, Information Systems
WOS记录号WOS:000368790400015
引用统计
被引频次:41[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/10670
专题多模态人工智能系统全国重点实验室_复杂系统智能机理与平行控制团队
通讯作者Derong Liu
作者单位1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
2.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
第一作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
Wei QingLai,Liu DeRong,Derong Liu. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems[J]. SCIENCE CHINA-INFORMATION SCIENCES,2015,58(12):122203:1–122203:15.
APA Wei QingLai,Liu DeRong,&Derong Liu.(2015).A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems.SCIENCE CHINA-INFORMATION SCIENCES,58(12),122203:1–122203:15.
MLA Wei QingLai,et al."A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems".SCIENCE CHINA-INFORMATION SCIENCES 58.12(2015):122203:1–122203:15.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2015_SCIS_A novel po(1215KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Wei QingLai]的文章
[Liu DeRong]的文章
[Derong Liu]的文章
百度学术
百度学术中相似的文章
[Wei QingLai]的文章
[Liu DeRong]的文章
[Derong Liu]的文章
必应学术
必应学术中相似的文章
[Wei QingLai]的文章
[Liu DeRong]的文章
[Derong Liu]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 2015_SCIS_A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems.pdf
格式: Adobe PDF
此文件暂不支持浏览
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。