CASIA OpenIR  > 复杂系统管理与控制国家重点实验室  > 深度强化学习
朱圆恒; 赵冬斌
Source Publication控制理论与应用
Other AbstractOne important factor of reinforcement learning (RL) algorithms is the online learning time. Conventional algorithms such Q-learning and SARSA can not give the quantitative analysis on the upper bound of the online learning time. In this paper, we employ the idea of Probably Approximately Correct (PAC) and design the data-driven online RL algorithm for continuous-time deterministic systems. This class of algorithms efficiently record online observations and keep in mind the exploration required by online RL. They are capable to learn the near-optimal policy within a finite time length. Two algorithms are developed, separately based on state discretization and kd-tree technique, which are used to store data and compute online policies. Both algorithms are applied to the two-link manipulator to observe the performance.
Keyword强化学习 概率近似正确 Kd树 双连杆机械臂
Document Type期刊论文
Recommended Citation
GB/T 7714
朱圆恒,赵冬斌. 概率近似正确的强化学习算法解决连续状态空间控制问题[J]. 控制理论与应用,2016,33(12):1603-1613.
APA 朱圆恒,&赵冬斌.(2016).概率近似正确的强化学习算法解决连续状态空间控制问题.控制理论与应用,33(12),1603-1613.
MLA 朱圆恒,et al."概率近似正确的强化学习算法解决连续状态空间控制问题".控制理论与应用 33.12(2016):1603-1613.
Files in This Item: Download All
File Name/Size DocType Version Access License
概率近似正确的强化学习算法解决连续状态空(1544KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[朱圆恒]'s Articles
[赵冬斌]'s Articles
Baidu academic
Similar articles in Baidu academic
[朱圆恒]'s Articles
[赵冬斌]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[朱圆恒]'s Articles
[赵冬斌]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 概率近似正确的强化学习算法解决连续状态空间控制问题.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.