基于自适应动态规划的分布式迭代控制方法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 复杂系统智能机理与平行控制团队

	基于自适应动态规划的分布式迭代控制方法研究
	李洪阳
	2022-05
页数	114
学位类型	博士
中文摘要	随着科学技术的快速发展，动态系统优化问题得到了极大的推动，最优控制理论由于其优越性和实用性受到了大量的关注。自适应动态规划是求解最优控制问题的一个有效方法，其核心思想是利用神经网络等函数近似结构，对迭代值函数和迭代控制律进行逼近，有效地解决了传统动态规划方法造成的“维数灾”问题。然而，在利用自适应动态规划求解分布式迭代控制问题的过程中，存在许多理论和技术问题亟需解决。为此，本文基于自适应动态规划方法，进一步对几类动态系统的分布式迭代控制问题开展了相关的研究工作。本文的主要工作和创新点包含如下五个方面： 1. 研究了非线性多控制系统的无限时域最优控制问题，提出了分布式策略迭代控制方法。在每次迭代过程中逐一更新迭代控制律，而不是更新所有迭代控制律，有效地减少了每次迭代的计算负担。对分布式策略迭代方法的特点进行了分析，包括单调性、收敛性和最优性，分析显示迭代值函数单调非增收敛到Hamilton-Jacobi-Bellman (HJB) 方程的解。 2. 研究了具有执行器故障的线性多控制系统的故障容错控制问题，提出了分布式故障容错控制方法。基于分布式策略迭代方法和故障补偿方法，实现了系统部分信息未知情况下的最优控制，并消除了执行器故障的影响。对分布式故障容错控制方法的稳定性和最优性进行了严格的理论分析，分析显示所提方法有效地保证了系统的稳定性，并减少了计算负担。 3. 研究了有限状态系统的最优控制问题，提出了分布式值迭代控制方法。将系统分为多个子系统，基于Bellman半环理论提出了摹矩阵方法，在Bellman半环下将Bellman方程转化为线性迭代矩阵方程，进而求解各个子系统的迭代值函数。提出了分布式值迭代方法，在每次迭代过程中分别对各个子系统迭代值函数进行更新，而不是更新所有迭代值函数，有效地减少了计算负担。对分布式值迭代控制方法的单调性、收敛性和最优性进行了分析，分析显示迭代值函数单调非减并在有限步中收敛到最优性能指标函数。 4. 研究了具有外部扰动的异构多智能体系统的最优输出分组一致性控制问题，提出了数据驱动分布式控制方法。设计分布式自适应观测器，对各个组领导者状态和系统矩阵进行估计。为了实现输出跟踪和扰动抑制，将分组输出跟踪问题转化为输出调节问题，并设计数据驱动强化学习算法获得系统的最优控制律和最优性能指标函数。给出了稳定性、收敛性和最优性分析，分析显示数据驱动分布式控制方法有效地实现了各个智能体对所在分组领导者的输出跟踪，并抑制了扰动的影响。 5. 研究了具有执行器饱和的多智能体系统最优一致性控制问题，提出了数据驱动分布式控制方法。引入多智能体博弈理论，将最优一致性控制问题转化为多智能体非零和博弈。设计脱策强化学习方法获得系统的纳什平衡解，并引入神经网络对所提方法进行了实现。对所提方法的收敛性和最优性进行了分析，分析显示迭代控制律收敛到纳什平衡。
英文摘要	With the rapid development of science and technology, the optimization problems of dynamic systems have been more closely looked at, and optimal control theory has gained widespread popularity due to its superiority. Adaptive dynamic programming (ADP) has been considered as one of the most effective techniques to solve optimal control problems, which solves the ``curse of dimensionality'' caused by the traditional dynamic programming methods effectively. The core idea of ADP is to approximate the iterative value functions and iterative control laws by utilizing function approximation structures, such as neural networks. However, there are still lots of theoretical and technical difficulties required to be conquered, while using ADP to cope with the distributed iteration control problems. Therefore, this thesis takes research on the distributed control problems based on ADP. The main contributions of this thesis include the following five parts. 1. A distributed policy iteration method is presented for the infinite horizon optimal control problems of multicontroller nonlinear systems. In each iteration of the presented method, only one iterative control law is updated, instead of all the iterative control laws, which effectively reduces the computational burden. The properties of the distributed policy iteration method are analyzed, such as monotonicity, convergence, and optimality, which show that the iterative value function is non-increasingly convergent to the solution of the Hamilton-Jacobi-Bellman (HJB) equation. 2. A distributed fault-tolerant control method is presented for the fault-tolerant control problems of multicontroller linear systems. Based on the distributed policy iteration method and fault compensation method, the optimal control is realized with the partial system information, and the effects of actuator faults are removed. The properties of the distributed fault-tolerant control method are analyzed, such as stability and optimality, which show that the presented method guarantees the stability of the control systems and reduces the computational burden. 3. A distributed value iteration method is presented for the optimal control problems of control systems with finite states. By separating the control systems into several subsystems, the modi-matrix method is presented to calculate the iterative value function of each subsystem, which can convert Bellman equation into a linear recursive matrix equation in the Bellman semiring. Then, a novel distributed value iteration method is established to iteratively update the iterative value function of each subsystem, instead of all the iterative value functions, which effectively reduces the computational burden. The properties of the distributed value iteration method are analyzed, such as monotonicity, convergence, and optimality, which show that the iterative value function is non-decreasingly convergent to the optimal performance index function in finite iteration steps. 4. A data-driven distributed control method is presented for the optimal output cluster synchronization control problems of the heterogeneous multi-agent systems with external disturbances. A novel distributed adaptive observer is introduced to estimate the state and system matrices of each leader. In order to realize the output tracking control and the disturbance rejection, the output cluster synchronization control problem is transformed into the output regulation problem, and reinforcement learning method is introduced to obtain the optimal control laws and optimal performance index functions. The stability, convergence, and optimality are analyzed, which show the effectiveness of the presented data-driven distributed control method. 5. A data-driven distributed control method is presented for the multi-agent systems with input saturation. The multi-agent game theory is introduced to transform the optimal synchronization control problem into a multi-agent nonzero-sum game. A novel off-policy reinforcement learning method is presented to obtain the Nash equilibrium solution, and the neural networks are introduced to implement the presented method. The convergence and optimality are analyzed, which show that the iterative control laws converge to the Nash equilibrium.
关键词	自适应动态规划，最优控制，分布式控制，智能控制，强化学习
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48556
专题	多模态人工智能系统全国重点实验室_复杂系统智能机理与平行控制团队毕业生_博士学位论文
推荐引用方式 GB/T 7714	李洪阳. 基于自适应动态规划的分布式迭代控制方法研究[D]. 人工智能学院. 中国科学院大学,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于自适应动态规划的分布式迭代控制方法研（3786KB）	学位论文		开放获取	CC BY-NC-SA