连续状态空间的强化学习问题

CASIA OpenIR > 毕业生 > 硕士学位论文

	连续状态空间的强化学习问题
其他题名	Reinforcement Learning Problems With Continuous State Spaces
	何源
	2007-06-24
学位类型	工学硕士
中文摘要	强化学习作为一种强有力的机器学习方法，其有效性已经在很多领域得到证明。它研究的是一个很普遍的问题：即如何在一个未知的动态环境中学习，以找到最优的行为策略。强化学习给人的希望是非常诱人的：只需要给出目标，agent 就会利用环境给出的奖励和惩罚信号，通过与环境不断进行试错式交互来自动地完成，而不必人为的告诉agent如果去完成目标。因此，近年来强化学习受到越来越多的关注。本文首先介绍了强化学习的基本理论和经典算法,讨论了各种算法之间的区别和联系以及算法间的融合。由于传统强化学习算法存在的问题是通常假设状态空间和行为空间是离散的，因此可以用和状态一一对应的查找表来表示状态的价值函数，但是实际上很多问题的状态空间是连续的，意味着查找表不再适用，从而大大地限制了强化学习方法在实际中的应用。对于状态空间连续，行为空间离散的强化学习问题，通常的处理方法是使用函数近似器替代离散的查找表。不幸的是这种方法往往会大大减弱算法的收敛性，很可能使学习过程不收敛，甚至在学习的起始阶段就失败了。因此，我们引入了基于核方法的强化学习，其特点是基于实例的价值函数估计和基于核方法的泛化。基于核方法的强化学习不仅能够直接处理具有连续状态空间的强化学习问题，而且在适当的假设条件下，可以在理论上保证，随着实例数目的增加，一定能收敛到最优的价值函数，也就是说这种方法在统计意义上是一致的，尽管如此，目前基于核方法的强化学习算法还很少，本文主要创新在于提出了三个不同版本的基于核方法的蒙特卡罗算法(KBMC)，它们把随机搜索算法，传统的蒙特卡罗方法和基于核方法的强化学习结合起来，能有效地处理具有连续状态空间和离散行为空间的强化学习问题，并在 mountain car 问题上进行一系列的实验，实验结果表明 KBMC 算法与经典的 Sarsa(λ), Q(λ), Actor-Critic(λ)算法相比，能收敛到更好的策略。
英文摘要	Reinforcement Learning (RL) is a powerful machine learning methodology and has proven effective in a variety of domains. Research into RL addresses a very general problem: how to learn good policies in an arbitrary unknown environment. Its promise is beguiling, the only thing you need to do is to tell the agent what is its goal, then the agent will try to reach the goal by making use of the reward and punish signals from the environment, and through trial and error interactions with the environment. So RL is attracting more and more attentions in recent years. We first introduce the basic theories and classic algorithms of RL, discuss the differences and relations between the algorithms. The problem exits for all of these classic algorithms is that they usually assume discrete state spaces and action spaces, but in fact most problems’ state spaces are continuous, which limits the practical use RL largely. To handle this problem, function approximator is commonly used to represent the value function instead of a look up table. Unfortunately, this method will typically weaken the convergence guarantees, probably make the algorithm divergence, and sometime even fail at the beginning of the learning. Kernel based Reinforcement Learning, which is characterized by instance based value function estimation and kernel based generalization, is a new method to cope with this problem. It converges to the optimal policy with theoretical guarantees as the number of instances increase, which means this method is consistent in the statistical sense，but there are few kernel based RL algorithms at present, in this paper, we propose three different versions of kernel based Monta Carlo algorithm (KBMC), which combines random search, traditional Monta Carlo method and kernel based Reinforcement Learning together, they can handle RL problems with continuous state spaces and discrete action spaces efficiently. We verify the algorithms on the mountain car problem with continuous state space and discrete action space, compare to classic algorithms Sarsa (λ), Q (λ) and Actor-Critic (λ), KBMC algorithms can converge to better polices.
关键词	强化学习连续状态空间核方法函数逼近 Reinforcement Learning Continuous State Space Kernel Method Function
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7419
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	何源. 连续状态空间的强化学习问题[D]. 中国科学院自动化研究所. 中国科学院研究生院,2007.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20042801462806（2826KB）			暂不开放	CC BY-NC-SA