CASIA OpenIR  > 毕业生  > 博士学位论文
连续状态-动作空间下强化学习方法的研究
Alternative TitleResearch on Reinforcement Learning Methods of Continuous State-Action Spaces
程玉虎
Subtype工学博士
Thesis Advisor易建强
2005-04-01
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline控制理论与控制工程
Keyword强化学习 连续空间 函数逼近 Rbf 网络 模糊推理系统 Reinforcement Learning Continuous Space Function Approximation Rbf Network Fuzzy Inference System
Abstract本文的主要内容和研究成果如下:首先,研究了离散状态和离散动作空间的强化学习问题,提出了一种基于资格迹机制的加权递归最小二乘多步 Q 学习算法,能够实现在线增量式学习,有效提高了算法的计算效率,并运用离散鞅理论对算法的收敛性进行了分析。 其次,针对具有连续状态空间下的控制问题,设计出一种自适应的强化学习算法。在 Actor-Critic 框架下,用一个归一化 RBF 网络同时逼近 Critic 的值函数和 Actor 的策略函数。由于 Actor 和 Critic 对网络输入层和隐层资源的共用,使得算法比较简单,同时实现了对状态空间的在线、自适应构建。 第三,提出了一类连续状态与连续动作空间下的加权 Q 学习算法。利用 RBF网络实现标准的 Q 学习,完成对离散动作效用值的逼近,然后采用加权规则对离散动作的效用值进行加权,得到作用于系统的连续动作,从而实现了将 Q 学习的应用扩展到具有连续动作空间的控制问题。 第四,利用模糊推理的可理解性与 RBF 网络的学习能力,首先构建了一类基于模糊 RBF 网络的模糊强化学习体系结构,然后基于此体系结构,分别设计出模糊 Actor-Critic 学习和模糊 Q 学习。这两种学习算法具有泛化性能好、网络结构紧凑、自适应和自学习的特点。 第五,设计出一种基于动态 Elman 网络预测模型的非线性直接多步预测控制器,将时间差分算法与 BP 算法相结合,对网络权值的实时调整进行渐进计算,并采用单值预测控制算法进行控制量的在线滚动优化计算。该方法具有结构简单、运算量小、速度快的特点,并且对系统参数的变化具有一定的自适应性。 最后对取得的研究成果进行了总结,并展望了需要进一步研究的工作。
Other Abstractcontributions of the dissertation are as follows: Firstly, a weighted recurrent least square multi-step Q learning algorithm is proposed in both discrete state and action spaces. The computation efficiency of the Q learning algorithm is improved by virtue of online and incremental learning. Discrete martingale theory is applied to analyze the convergence property of the proposed Q learning algorithm. Secondly, an adaptive Actor-Critic reinforcement learning algorithm is designed in continuous state spaces. A normalized RBF neural network is used to approximate the value function of Critic and the policy function of Actor simultaneously under an Actor-Critic architecture. The algorithm is very simple due to the share of the input and the hidden layers of the NRBF network by the Actor and the Critic, and it can realize online and adaptive constructing of state space. Thirdly, a proposal of weighted Q Learning algorithm suitable for control systems with continuous state and action spaces is put forward. At first, the standard Q implemented by RBF network is used to approximate the utility values of discrete actions, and then a weighted rule is applied to weight the utility values of each discrete action, continuous action that actually acts upon the system can be obtained in this way. The application area of Q learning is then extended to problems with continuous state and action spaces. Fourthly, a fuzzy reinforcement learning architecture based on a four-layer fuzzy RBF neural network is proposed by using the knowledge presenting property of fuzzy inference system and the self-learning property of RBF network fully. Based on the fuzzy reinforcement learning architecture, a fuzzy Actor-Critic learning and a fuzzy Q learning are designed. The two fuzzy reinforcement learning methods both have advantages of perfect generalization ability, compact network structure, self-adaptive, and self-learning. Fifthly, a proposal of nonlinear multi-step predictive controller based on a modified Elman recurrent neural network is designed. A new hybrid learning algorithm combining the temporal differences method with BP algorithm to train the Elman prediction model is put forward according to the intrinsic defects of BP algorithm that can not update network weights incrementally. In order to simplify computation, the single-value predictive control algorithm is used to optimize the control input of the next step. The predictive controller has characteristics of simple structure, small calculating amount, fast speed, and has self-adaptive ability for the change of parameters of the plant. Finally, a summary of the dissertation is given and some future works areaddressed.
shelfnumXWLW931
Other Identifier200218014603155
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/5835
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
程玉虎. 连续状态-动作空间下强化学习方法的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2005.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[程玉虎]'s Articles
Baidu academic
Similar articles in Baidu academic
[程玉虎]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[程玉虎]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.