|Other Abstract||One important factor of reinforcement learning (RL) algorithms is the online learning time. Conventional algorithms such Q-learning and SARSA can not give the quantitative analysis on the upper bound of the online learning time. In this paper, we employ the idea of Probably Approximately Correct (PAC) and design the data-driven online RL algorithm for continuous-time deterministic systems. This class of algorithms efficiently record online observations and keep in mind the exploration required by online RL. They are capable to learn the near-optimal policy within a finite time length. Two algorithms are developed, separately based on state discretization and kd-tree technique, which are used to store data and compute online policies. Both algorithms are applied to the two-link manipulator to observe the performance.|
朱圆恒,赵冬斌. 概率近似正确的强化学习算法解决连续状态空间控制问题[J]. 控制理论与应用,2016,33(12):1603-1613.
朱圆恒,et al."概率近似正确的强化学习算法解决连续状态空间控制问题".控制理论与应用 33.12(2016):1603-1613.
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.