CASIA OpenIR  > 毕业生  > 硕士学位论文
Thesis Advisor刘德荣
Degree Grantor中国科学院研究生院
Place of Conferral北京
Keyword自适应动态规划 强化学习 神经网络 跟踪控制 饱和系统
自适应动态规划(Adaptive dynamic programming,ADP)是最优控制领域新近兴起的一种近似最优方法,其融合了动态规划、强化学习和神经网络的思想,有效地克服了传统动态规划方法中“维数灾”的问题。在求解最优控制中的哈密顿—雅克比—贝尔曼(Hamilton-Jacobi-Bellman,HJB)方程时,ADP方法采用了函数近似结构来获得HJB方程的解,然后利用迭代方法获得最优控制策略。目前,ADP算法主要分为值迭代和策略迭代两种,本文基于这两种算法,提出了广义策略迭代ADP算法,并首次将上述算法应用于跟踪系统和带执行器饱和的被控系统上。本文的主要工作和贡献总结如下:
1、 本文提出了广义策略迭代ADP算法。与传统的ADP迭代算法相比,广义策略迭代ADP算法有两个迭代指标$i$和$j$,在$i$迭代过程中,广义策略迭代ADP算法只需要更新迭代控制律而不需要求解HJB方程。而且有文献指出,几乎所有的强化学习和自适应动态规划算法都可以用广义策略迭代算法描述,更加说明了研究广义策略迭代ADP算法的普遍意义。
2、 本文将广义策略迭代ADP算法应用于跟踪系统中。对于非线性离散跟踪系统,通过系统转化将跟踪系统转化为一般非线性系统,采用广义策略迭代ADP算法,得到最优跟踪控制器,利用神经网络实现跟踪系统的最优跟踪,并对迭代算法的收敛性和控制系统的稳定性进行了证明。两个仿真实验的结果表明了所提方法是有效可行的。
3、 本文将广义策略迭代ADP算法应用于带执行器饱和的被控系统中。首先采用新的效用函数来保证输出的控制信号在给定范围内,进而得到新的性能指标函数,然后利用广义策略迭代ADP算法处理执行器饱和问题同时得到最优控制器,并给出相应的证明。最后进行仿真实验,通过实验结果,可以发现广义策略迭代ADP算法有效地解决了饱和系统的最优控制问题,但是如果要克服执行器饱和的情况,就需要牺牲系统达到稳定的时间。
Other Abstract
In recent decades, with the increasing complexity of industrial systems, the requirement of the system control performance becomes higher and higher. We not only need to ensure the stability of the system, but also consider its optimality. Thus, the optimal control is one of the hot topics in control fields. Adaptive dynamic programming (ADP) is an emerging optimal approach in the field of optimal control. ADP method combines the idea of dynamic programming, reinforcement learning and neural networks, and effectively avoids the problem of ``dimensionality disaster". ADP method uses approximation structure to approximate the solution of the Hamiltonian-Jacobi-Bellman (HJB) equation, and then the optimal control law is obtained via iterative method. Value iteration and policy iteration are two classes of ADP algorithms to obtain the solution of the HJB. Based on the previous two algorithms, a generalized iteration ADP algorithm is proposed, and it is applied to the tracking system and the saturated system for the first time. The main work and contribution of this paper is embodied in the following three aspects.
1. In this paper, a novel iterative ADP algorithm, called generalized policy iteration ADP algorithm, is proposed. The idea is to use two iteration procedures, including an $i$-iteration and a $j$-iteration, to obtain the iterative control law and the iterative value function.
2. By system transformation, we first convert the optimal tracking control problem into an optimal regulation problem. Then, the generalized policy iteration ADP algorithm, which is a general idea of interacting policy and value iteration algorithms, is introduced to deal with the optimal regulation problem. The convergence and optimality of the generalized policy iteration algorithm are analyzed. Three neural networks are used to implement the developed algorithm. Finally, simulation examples are given to illustrate the performance of the presented algorithm.
3. A integral function is constructed to tackle the saturation nonlinearity of actuators, then the generalized policy iteration ADP algorithm is developed to deal with the optimal control problem. Compared with other algorithms, the developed ADP algorithm includes two iteration procedures. In the present control scheme, two neural networks are introduced to approximate the control law and performance index function. Furthermore, numerical simulations illustrate the convergence and feasibility of the developed method.
Document Type学位论文
Recommended Citation
GB/T 7714
林桥. 基于ADP的非线性系统自学习最优控制方法研究[D]. 北京. 中国科学院研究生院,2017.
Files in This Item:
File Name/Size DocType Version Access License
林桥毕业论文最终版.pdf(2432KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[林桥]'s Articles
Baidu academic
Similar articles in Baidu academic
[林桥]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[林桥]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.