CASIA OpenIR  > 毕业生  > 硕士学位论文
Alternative TitleReinforcement Learning Theory & Application in Path Planning Problem
Thesis Advisor王珏
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword增强学习 函数逼近 直接增强学习 多主体 Reinforcement Leaming Function Approximation Direct Reinforcement Learning Multi-agent System
Abstract人工智能的一个重要目标就是制造能够在复杂的未知环境中具有自动学习 能力的智能agent,而目前最受重视的监督学习自身的特点决定了它很难完成这 -目标。与监督学习不同,增强学习恰恰是从和环境的交互中通过试错法来学 习,这一特点使得增强学习受到了越来越多的研究。增强学习应该理解为一类 问题的集合,而不是一类方法的集合。当agent必须通过和环境的试错式交互来 提高自己的行为、达到某种目标时,它所面临的问题就是增强学习的问题[17]。 本文先介绍了增强学习的基本理论和经典算法,在经典方法中,价值函数用 和状态一一对应的查找表方式表示的,随着状态的增多,将陷入维数灾问题, 因此,接下来对经典算法中计算价值函数的方法加以改进,不再精确表示,而 改用梯度下降等监督学习的方法来对价值函数进行函数逼近,这样可以极大地 扩展增强学习的适用范围。文中给出了几种梯度下降方法,并对它们的收敛性 进行了比较。另外,针对基于价值函数的学习方法的固有弱点:对价值函数的 估计出现的小偏差可能导致最终结果的大偏差,又给出了直接增强学习方法, 即根据从策略到反馈的不通过价值函数的直接映射,使用梯度上升的方法,对 策略空间进行直接搜索。这也可以看成非经典的增强学习方法。 本文还进行了一些增强学习方法在agent路径规划问题上的实验研究,包括 早agent系统和多agent系统,在这些实验里,我们综合使用了多种增强学习方 法和技巧。从实验中可以看出,通过试错法,agent不但可以适应即使是动态的 习、境找到最优路径,而且还会逐渐形成合作、竞争等关系;另一方面,恰当地 综合运用多种学习方法,将会使得学习效率大大提高。
Other AbstractOne of the important goals in Artificial Intelligence is to build agents that can learn behaviors automatically through interaction with the environment. Due to its inherent characteristic, supervised learning, the kind of learning studied in most current research, is not adequate for achieving that goal, while reinforcement learning is very fit for building such agents. For this reason, nowadays, reinforcement learning is subject to more and more attention. Reinforcement learning is defined by characterizing a learning problem faced by agents that learn behavior through trial-and-error interactions with a dynamic environment. In this article, the basic theory and algorithms of RL are introduced. These algorithms are based on value function, and typically developed for lookup tables. With the increasing of the number of states, these algorithms are exposed to the curse of dimensionality. Therefore, lookup tables are replaced by fumction approximators. New families of algorithms are derived based on stochastic gradient descent to adjust the parameters in the fumction approximators. In addition, because of the fundamental limitation in all value-function-based methods, direct gradient-based reinforcement learning is introduced. Additionally, simulations are presented that show the applications of reinforcement learning methods in the path planning problem. In the simulations, various methods and techniques are combined. The results show that with reinforcement learning methods, agents not only can find the optimal path even in the dynamic environment, but also can coordinate or compete with each other.
Other Identifier695
Document Type学位论文
Recommended Citation
GB/T 7714
汤俏. 增强学习的理论和在路径规划问题中的应用[D]. 中国科学院自动化研究所. 中国科学院研究生院,2003.
Files in This Item:
File Name/Size DocType Version Access License
硕士生学位论文-695.pdf(3837KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[汤俏]'s Articles
Baidu academic
Similar articles in Baidu academic
[汤俏]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[汤俏]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.