CASIA OpenIR  > 毕业生  > 博士学位论文
基于自适应动态规划的最优跟踪控制方法研究
王鑫
2023-05-23
页数110
学位类型博士
中文摘要

跟踪控制一直以来都是控制领域的关键研究课题之一,在各种实际工程中 有广泛的应用。跟踪控制的主要目的是设计控制律,抵消外部噪声对系统的干 扰,从而使系统输出能无偏差地跟踪目标的参考信号。最优跟踪控制将最优控制 与跟踪控制相结合,其目的是设计相应的最优控制律,使系统能在完成跟踪任务 的同时,最小化具有相应物理意义的代价函数,实现闭环系统的稳定。因此,对于具有不同特性的被控对象设计不同的最优跟踪控制方法,使其能更好地适用于各种实际问题,在理论和实际方面都有非常宝贵的价值。在解决具体的最优控制问题中,动态规划方法作为一种的常用工具,根据预设的代价函数建立相应 的递推公式,利用系统的终端状态逆向计算一系列的优化问题,从而得到系统的 最优控制。但是随着复杂系统的兴起,系统状态和控制向量的维数大幅增加,动态规划中每一步优化问题的求解都变得十分困难,使其适用范围受到了很大的限制。自适应动态规划方法的出现解决了传统动态规划方法中 Hamilton-JacobiBellman(HJB) 方程难以计算的问题。该方法的关键在于利用神经网络等近似工具对值函数和控制律进行替代,并通过迭代更新的思想使值函数和控制律随着迭代次数的增加逐渐向最优值函数和最优控制律收敛。然而,通过自适应动态规 划解决各类最优跟踪控制问题的过程中,还留有一些理论和实际问题需要进行 深入探讨。基于此,本文研究并提出了几类最优跟踪控制问题的自适应动态规划方法,主要工作内容及相应的创新点如下:

1. 输出调节是解决输出跟踪问题的常用方法,能在抑制外部扰动的同时保持闭环系统的稳定。针对动力学模型未知且带有不可测扰动的一般线性系统,提出了一种基于离线学习的最优输出调节控制方案,利用状态-输入数据来确定未知的系统方程。同时根据外部系统矩阵的最小多项式来获得输出方程中外部扰动的参数。通过稳定性分析证明了所得到的最优控制律可以使闭环系统稳定。仿真实验表明,在外部干扰存在的情况下系统输出能渐近地跟踪参考信号。

2. 追逃问题可以被视为一类特殊的最优跟踪控制问题。从追捕者视角来看,其主要目的是以博弈框架为基础设计控制策略,使追捕者能最优地对逃跑者进行跟踪。本课题研究了带约束的有限时间非线性系统的追逃问题最优策略,将自适应动态规划方法中策略迭代的思想与 Pontryagin 极大值原理相结合,把原问题分解为两个最优控制问题,设计自博弈迭代算法来获得博弈双方的最优控制策略。在由 Pontryagin 极大值原理构成的哈密顿函数有唯一极值的条件下,证明了该迭代算法所得到的最优策略最终可收敛于博弈的纳什均衡解。通过两种不同情况下的仿真实验验证了所提出方法的有效性。此外,针对本方智能体无法获取对方策略的情况,以模型预测控制为基础设计了新的算法,在能够获取对方当前位置信息的前提下实现了追逃双方的近似最优策略。

3. 研究多智能体系统领导-跟随控制问题的目标是设计一致性协议使每个跟随者实现对领导者状态的跟踪。针对有向拓扑结构下带有随机干扰的线性多智能体系统领导-跟随一致性控制问题,不同于一般方法中基于拓扑结构来设计具有静态耦合权值的控制律。提出一种基于 Riccati 方程的分布式自适应控制器,利用相邻智能体之间的状态误差来更新与增益矩阵相关的耦合权值,使领导者和跟随者之间状态误差的期望渐近收敛到零。此外,通过自适应动态规划中策略迭代的方法,求解出系统的最优反馈控制增益。采用 Lyapunov 直接方法和 Itô 公式证明了闭环系统的稳定性。通过对比实验,验证了所提出方法的优势及有效性。

4. 对于异构多智能体系统,最优输出一致性控制可以实现跟随者对领导者的最优输出跟踪。针对具有不完全信息的异构多智能体系统的最优输出一致性控制问题,在多智能体微分博弈框架下提出了一种基于数据的自适应迭代算法。该方法首先为系统中的每个跟随者构造一个伪系统,从而将多智能体输出调节问题转化为各伪系统的状态稳定问题,通过引入微分图博弈框架来获得相互耦合的多智能体系统的稳定控制。为了求解耦合 HJB 方程,利用自适应动态规划方法中策略迭代的思想设计基于数据的离线强化学习算法,证明了该算法得到的控制律最终使系统收敛到全局纳什均衡。

 

英文摘要

Tracking control has always been one of the critical research in cybernetics, with widespread applications in various practical engineering systems. The main objective of tracking control is to design control laws which can compensate for external disturbances, so that the output of the system can track the desired reference signal without static error. Optimal tracking control combines optimal control with tracking control to pursue the corresponding control laws, which minimizes the cost function with practical implication while achieving stability for the closed-loop system. Therefore, designing specific optimal tracking control methods for controlled objects with different characteristics can offer significant value in both theoretical and practical aspects. As a common tool for solving the optimal control problem, dynamic programming approach establishes the corresponding recursive formula according to the predefined cost function. By reverse-calculating a series of optimization problems from the terminal state of the system, the optimal control law can be obtained. However, with the popularization of complex systems, the dimensionality of states and control vectors has substantially increased. Consequently, solving the optimization problem at each step of dynamic programming has become considerably challenging, which poses limitations on its applicability. The emergence of adaptive dynamic programming addresses the challenge produced by calculating Hamilton-Jacobi-Bellman (HJB) equations in dynamic programming. The key to this method is to apply approximation tools such as neural networks to replace the value function and control law. By means of iterative updates, the value function and control law can gradually converge to the optimal value function and optimal control law with the increasing of iterations. However, there are still some theoretical and practical issues that need to be further explored in the process of utilizing adaptive dynamic programming to solve various optimal tracking control problems. Therefore, this dissertation investigates and proposes several adaptive dynamic programming methods for optimal tracking control problems. The main contributions and corresponding contents are listed as follows:

1. Output regulation is a useful method to solve output tracking problems, which can simultaneously suppress the external disturbances and maintain the stability of closedloop systems. For general linear systems with unknown dynamics and unmeasurable disturbances, an optimal output regulation control scheme based on offline learning is presented. The unknown system equations are identified using state-input data, and the parameters of the external disturbances in the output equation are obtained through the minimal polynomial of the external system matrix. The stability analysis demonstrates that the designed control scheme can stabilize the closed-loop system. Simulation experiments show that the proposed method can achieve asymptotic tracking of the reference signal in the presence of external disturbances.

2. The pursuit-evasion problem can be viewed as a specific type of optimal tracking control problem. From the perspective of the pursuer, the main purpose is to design the control scheme based on a game framework that enable the pursuer to track the evader optimally. The optimal strategies for the pursuit-evasion problem of the finitetime nonlinear systems with constraints are considered. By combining the policy iteration approach of adaptive dynamic programming with the Pontryagin maximum principle method, the original problem is decomposed into two optimal control subproblems. A self-play iterative algorithm is presented to obtain the optimal control law for both players in the game. Under the condition that the Hamiltonian composed of Pontryagin maximum principle has a unique extremum, it can be proven that the optimal strategies obtained by the iterative algorithm can eventually converge to the Nash equilibrium solution of the game. Two different cases are presented and analyzed to demonstrate the effectiveness of the proposed approach. In addition, another algorithm is designed based on model predictive control for situations where the agent is unable to obtain the opponent’s strategy. This algorithm achieves an approximate optimal strategy for both sides when the opponent’s current position information is available.

3. The objective of the leader-follower control problem in multi-agent systems is to design a consensus protocol that enables each follower to track the state of the leader. In contrast to the conventional methods that design control laws with static coupling weights based on the topology structure, a distributed adaptive controller based on the Riccati inequality is proposed for the leader-follower consensus control problem of the linear multi-agent systems in directed topology with random disturbances. The coupling weights related to the gain matrix are updated using the state error between adjacent agents, enabling the expectation of the state errors between the leader and followers asymptotically converge to zero. Through Lyapunov directed method and Itô formula, the stability of the closed-loop system is proven. Through comparative experiments including the two schemes above, the effectiveness and advantages of the developed control method are verified.

4. For the heterogeneous multi-agent systems, output consensus control can achieve the output tracking between followers and the leader. Based on the framework of differential graphical games, a data-based adaptive iterative algorithm is presented for the output consensus control problem of a multi-agent system with incomplete information. The presented method first constructs the pseudo system for each follower in the system to transform the output regulation problem of the multi-agent system into a state stability problem of each pseudo system. The differential graphical game framework is applied to obtain the stable control of the interdependent multi-agent systems. Inspired by the policy iteration in adaptive dynamic programming, a data-driven offline reinforcement learning algorithm is designed to figure out the coupled HJB equations. It is proven that the control law obtained by the designed algorithm eventually drives the system to converge to the global Nash equilibrium.

关键词自适应动态规划 输出调节 追逃博弈 最优控制 一致性控制
语种中文
七大方向——子方向分类智能控制
国重实验室规划方向分类复杂系统建模与推演
是否有论文关联数据集需要存交
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/51925
专题毕业生_博士学位论文
多模态人工智能系统全国重点实验室_复杂系统智能机理与平行控制团队
推荐引用方式
GB/T 7714
王鑫. 基于自适应动态规划的最优跟踪控制方法研究[D],2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
基于自适应动态规划的最优跟踪控制方法研究(6647KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王鑫]的文章
百度学术
百度学术中相似的文章
[王鑫]的文章
必应学术
必应学术中相似的文章
[王鑫]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。