CASIA OpenIR  > 毕业生  > 硕士学位论文
Alternative TitleReinforcement Learning Problems With Continuous State Spaces
Thesis Advisor张文生
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline计算机应用技术
Keyword强化学习 连续状态空间 核方法 函数逼近 Reinforcement Learning Continuous State Space Kernel Method Function
Abstract强化学习作为一种强有力的机器学习方法,其有效性已经在很多领域得到证 明。它研究的是一个很普遍的问题:即如何在一个未知的动态环境中学习,以找 到最优的行为策略。强化学习给人的希望是非常诱人的:只需要给出目标,agent 就会利用环境给出的奖励和惩罚信号,通过与环境不断进行试错式交互来自动地 完成,而不必人为的告诉agent如果去完成目标。因此,近年来强化学习受到越 来越多的关注。 本文首先介绍了强化学习的基本理论和经典算法,讨论了各种算法之间的区 别和联系以及算法间的融合。由于传统强化学习算法存在的问题是通常假设状态 空间和行为空间是离散的,因此可以用和状态一一对应的查找表来表示状态的价 值函数,但是实际上很多问题的状态空间是连续的,意味着查找表不再适用,从 而大大地限制了强化学习方法在实际中的应用。对于状态空间连续,行为空间离 散的强化学习问题,通常的处理方法是使用函数近似器替代离散的查找表。不幸 的是这种方法往往会大大减弱算法的收敛性,很可能使学习过程不收敛,甚至在 学习的起始阶段就失败了。 因此,我们引入了基于核方法的强化学习,其特点是基于实例的价值函数估 计和基于核方法的泛化。基于核方法的强化学习不仅能够直接处理具有连续状态 空间的强化学习问题,而且在适当的假设条件下,可以在理论上保证,随着实例 数目的增加,一定能收敛到最优的价值函数,也就是说这种方法在统计意义上是 一致的,尽管如此,目前基于核方法的强化学习算法还很少,本文主要创新在于 提出了三个不同版本的基于核方法的蒙特卡罗算法(KBMC),它们把随机搜索算 法,传统的蒙特卡罗方法和基于核方法的强化学习结合起来,能有效地处理具有 连续状态空间和离散行为空间的强化学习问题,并在 mountain car 问题上进行 一系列的实验,实验结果表明 KBMC 算法与经典的 Sarsa(λ), Q(λ), Actor-Critic(λ)算法相比,能收敛到更好的策略。
Other AbstractReinforcement Learning (RL) is a powerful machine learning methodology and has proven effective in a variety of domains. Research into RL addresses a very general problem: how to learn good policies in an arbitrary unknown environment. Its promise is beguiling, the only thing you need to do is to tell the agent what is its goal, then the agent will try to reach the goal by making use of the reward and punish signals from the environment, and through trial and error interactions with the environment. So RL is attracting more and more attentions in recent years. We first introduce the basic theories and classic algorithms of RL, discuss the differences and relations between the algorithms. The problem exits for all of these classic algorithms is that they usually assume discrete state spaces and action spaces, but in fact most problems’ state spaces are continuous, which limits the practical use RL largely. To handle this problem, function approximator is commonly used to represent the value function instead of a look up table. Unfortunately, this method will typically weaken the convergence guarantees, probably make the algorithm divergence, and sometime even fail at the beginning of the learning. Kernel based Reinforcement Learning, which is characterized by instance based value function estimation and kernel based generalization, is a new method to cope with this problem. It converges to the optimal policy with theoretical guarantees as the number of instances increase, which means this method is consistent in the statistical sense,but there are few kernel based RL algorithms at present, in this paper, we propose three different versions of kernel based Monta Carlo algorithm (KBMC), which combines random search, traditional Monta Carlo method and kernel based Reinforcement Learning together, they can handle RL problems with continuous state spaces and discrete action spaces efficiently. We verify the algorithms on the mountain car problem with continuous state space and discrete action space, compare to classic algorithms Sarsa (λ), Q (λ) and Actor-Critic (λ), KBMC algorithms can converge to better polices.
Other Identifier200428014628061
Document Type学位论文
Recommended Citation
GB/T 7714
何源. 连续状态空间的强化学习问题[D]. 中国科学院自动化研究所. 中国科学院研究生院,2007.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20042801462806(2826KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[何源]'s Articles
Baidu academic
Similar articles in Baidu academic
[何源]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[何源]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.