Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems
Zhu, Yuanheng1; Zhao, Dongbin1; He, Haibo2; Ji, Junhong3
2015-12-01
发表期刊COGNITIVE COMPUTATION
卷号7期号:6页码:763-771
文章类型Article
摘要Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem.
关键词Approximate Policy Iteration Approximation Error Optimal Control Fuzzy Approximator
WOS标题词Science & Technology ; Technology ; Life Sciences & Biomedicine
DOI10.1007/s12559-015-9350-z
关键词[WOS]NONLINEAR-SYSTEMS ; FEEDBACK-CONTROL ; MOBILE ROBOTS ; ALGORITHM
收录类别SCI
语种英语
项目资助者National Natural Science Foundation of China(61273136) ; State Key Laboratory of Robotics and System(SKLRS-2015-ZD-04) ; National Science Foundation (NSF)(ECCS 1053717)
WOS研究方向Computer Science ; Neurosciences & Neurology
WOS类目Computer Science, Artificial Intelligence ; Neurosciences
WOS记录号WOS:000366329200012
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/10525
专题复杂系统管理与控制国家重点实验室_深度强化学习
作者单位1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
2.Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
3.Harbin Inst Technol, State Key Lab Robot & Syst, Harbin 150001, Peoples R China
推荐引用方式
GB/T 7714
Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,et al. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems[J]. COGNITIVE COMPUTATION,2015,7(6):763-771.
APA Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,&Ji, Junhong.(2015).Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems.COGNITIVE COMPUTATION,7(6),763-771.
MLA Zhu, Yuanheng,et al."Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems".COGNITIVE COMPUTATION 7.6(2015):763-771.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
art%3A10.1007%2Fs125(809KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhu, Yuanheng]的文章
[Zhao, Dongbin]的文章
[He, Haibo]的文章
百度学术
百度学术中相似的文章
[Zhu, Yuanheng]的文章
[Zhao, Dongbin]的文章
[He, Haibo]的文章
必应学术
必应学术中相似的文章
[Zhu, Yuanheng]的文章
[Zhao, Dongbin]的文章
[He, Haibo]的文章
相关权益政策
暂无数据
收藏/分享
文件名: art%3A10.1007%2Fs12559-015-9350-z.pdf
格式: Adobe PDF
此文件暂不支持浏览
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。