CASIA OpenIR  > 复杂系统管理与控制国家重点实验室  > 深度强化学习
 Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems Zhu, Yuanheng1; Zhao, Dongbin1; He, Haibo2; Ji, Junhong3 Source Publication COGNITIVE COMPUTATION 2015-12-01 Volume 7Issue:6Pages:763-771 Subtype Article Abstract Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem. Keyword Approximate Policy Iteration Approximation Error Optimal Control Fuzzy Approximator WOS Headings Science & Technology ; Technology ; Life Sciences & Biomedicine DOI 10.1007/s12559-015-9350-z WOS Keyword NONLINEAR-SYSTEMS ; FEEDBACK-CONTROL ; MOBILE ROBOTS ; ALGORITHM Indexed By SCI Language 英语 Funding Organization National Natural Science Foundation of China(61273136) ; State Key Laboratory of Robotics and System(SKLRS-2015-ZD-04) ; National Science Foundation (NSF)(ECCS 1053717) WOS Research Area Computer Science ; Neurosciences & Neurology WOS Subject Computer Science, Artificial Intelligence ; Neurosciences WOS ID WOS:000366329200012 Citation statistics Cited Times:2[WOS]   [WOS Record]     [Related Records in WOS] Document Type 期刊论文 Identifier http://ir.ia.ac.cn/handle/173211/10525 Collection 复杂系统管理与控制国家重点实验室_深度强化学习 Affiliation 1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China2.Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA3.Harbin Inst Technol, State Key Lab Robot & Syst, Harbin 150001, Peoples R China Recommended CitationGB/T 7714 Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,et al. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems[J]. COGNITIVE COMPUTATION,2015,7(6):763-771. APA Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,&Ji, Junhong.(2015).Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems.COGNITIVE COMPUTATION,7(6),763-771. MLA Zhu, Yuanheng,et al."Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems".COGNITIVE COMPUTATION 7.6(2015):763-771.