非线性系统自学习优化平行控制方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	非线性系统自学习优化平行控制方法研究
	卢经纬
	2022-05-19
页数	204
学位类型	博士
中文摘要	随着信息技术的迅速发展，航空航天、城市交通、工业制造以及能源行业等高新和传统领域都对控制系统的品质提出了新的要求，因此优化控制方法的研究得到了广泛的关注。在实际应用中，几乎所有系统都具有不同程度的非线性特性，故经典线性优化控制方法会在一定程度上影响控制系统的品质。近年来，研究人员提出了基于人工系统 + 计算实验 + 平行执行（Artificial systems + computational experiments + parallel execution，ACP）方法的平行控制，并在处理复杂系统问题方面取得了一系列的成果。除平行控制外，自适应动态规划（Adaptive dynamic programming，ADP）结合了动态规划、神经网络以及强化学习，是一类有效的非线性优化控制设计技术。在已有优化控制工作的基础上，本文进一步研究了非线性系统自学习优化平行控制方法。本文主要工作如下： 1. 针对离散非零和博弈问题提出了一种事件触发平行控制方法。首先，采用时间触发最优值函数和控制律设计了事件触发算法。因此，所提出的事件触发算法仅需求解时间触发的哈密顿—雅可比—贝尔曼（Hamilton-Jacobi-Bellman，HJB）方程。然后，从理论上证明了闭环系统是渐近稳定的，并且可以预先确定所有控制器实际性能指标之和的一个上界。实施所提出事件触发算法的一个关键步骤是获得系统下一步状态，这在实际系统中很难实现。因此，设计了基于平行控制的实施方案，通过构造实际系统的平行系统来预测系统下一步状态，并获得最优值函数和控制律。同时，所设计的平行控制实施方案采用了神经网络和迭代ADP技术。此外，在考虑神经网络权值近似误差的情况下进一步证明了控制稳定性。 2. 针对未知连续非仿射非线性系统提出了一种不需要重构未知系统的在线近似最优平行控制方法。首先，提出了一个实现未知连续非仿射非线性系统近似最优控制的框架，将非仿射系统的优化控制问题转化为仿射系统的优化控制问题。在所提出的框架中，根据系统状态向量和控制输入的维度构造了增广仿射系统，并根据原始性能指标构造了增广性能指标。其次，分析了原始系统和增广仿射系统稳定性之间的关系，并证明了在增广性能指标中选择一个合适的参数，增广仿射系统和增广性能指标的最优控制等价于原始系统和原始性能指标的近似最优控制。随后，基于增广仿射系统和增广性能指标，将积分强化学习拓展到完全未知的非仿射非线性系统，并在没有输入动态有界假设的情况下证明了闭环系统中的信号是一致最终有界的。此外，在所提出的在线近似最优平行控制方法中，原始性能指标可以是任意正定函数，并且状态向量收敛的界可由一个设计参数确定。 3. 针对未知离散非线性系统提出了一种事件触发优化平行控制方法。首先，提出了增广系统和增广性能指标以实现平行控制。同时，分析了原始系统和增广系统稳定性之间的关系，并证明了选择一个合适的增广性能指标，增广系统和增广性能指标的最优控制可以被看作原始系统和原始性能指标的近似最优控制。其次，提出了一个事件触发控制框架，并采用时间触发最优值函数和控制律设计了一个触发条件。同时，证明了触发条件下闭环系统的稳定性，并预先确定了实际性能指标的一个上界。然后，为了在未知离散非线性系统中实施所设计的触发条件，提出了一种不需要重构未知系统的在线学习算法。随后，分析了闭环系统中信号的收敛性，且分析不需要输入动态有界的假设。 4. 针对连续非仿射非线性系统提出了一种在线优化平行跟踪控制方法。首先，基于原始系统状态向量和控制输入的维度以及期望信号，构造了增广仿射系统以及增广期望信号。然后，通过构造原始跟踪误差系统和增广跟踪误差系统将跟踪控制问题转化为调节控制问题，并根据原始性能指标设计了增广性能指标。同时，从理论上讨论了原始跟踪误差系统和增广跟踪误差系统稳定性之间的关系以及原始性能指标和增广性能指标之间的关系。此外，采用ADP和神经网络技术设计了在线学习算法以实现在线优化跟踪控制，并分析了闭环系统中信号的收敛性。所提出的方法可以直接应用于非仿射非线性系统，不需要将非仿射非线性系统重构成仿射非线性形式，且在期望信号存在有限个跳跃间断点的情况下仍可以保证控制输入的连续性。
英文摘要	With the rapid development of information technology, high-tech and traditional fields such as aerospace, urban transportation, industrial manufacturing, and the energy industry have put forward new requirements for the quality of control systems, so the research of optimal control methods has received widespread attention. In practice, almost all systems exhibit varying degrees of nonlinear characteristics, so the classical linear optimal control methods could degrade the quality of control systems to a certain extent. In recent years, researchers have proposed ACP (Artificial systems + computational experiments + parallel execution) method-based parallel control, and parallel control has achieved a number of achievements in dealing with complex system problems. In addition to parallel control, adaptive dynamic programming (ADP), which combines dynamic programming, neural networks, and reinforcement learning, is an effective class of nonlinear optimal control design techniques. Based on the existing optimal control works, this dissertation further investigates self-learning optimal parallel control methods for nonlinear systems. The main contributions of this dissertation are outlined as follows: 1. An event-triggered parallel control method is developed for discrete-time non-zero-sum game problems. First, an event-triggered algorithm is developed using the time-triggered optimal value functions and control laws. Therefore, the developed event-triggered algorithm only needs to solve the time-triggered Hamilton–Jacobi-Bellman (HJB) equations. Then, it is shown theoretically that the closed-loop system is asymptotically stable and an upper bound for the sum of the actual performance indices of all the players can be determined in advance. A key step in the implementation of the developed event-triggered algorithm is to obtain the next state of the system, which is difficult to implement on the actual system. Thus, a parallel control-based implementation plan is designed to predict the next state by constructing the parallel system for the actual system and obtain the optimal value functions and control laws. The neural network and iterative ADP techniques are employed in the designed parallel control-based implementation plan. Moreover, the control stability is shown further in the consideration of the neural network weight approximation errors. 2. An online nearly optimal parallel control method is developed for unknown continuous-time nonaffine nonlinear systems without recovering unknown systems. First, a framework for nearly optimal control of unknown continuous-time nonaffine nonlinear systems is developed to convert the optimal control problem of nonaffine systems into the optimal control problem of affine systems. In the developed framework, an augmented affine system is constructed based on the dimensions of the original system states and control input, and an augmented performance index is constructed according to the original performance index. Then, the control stability relationship between the original system and the augmented affine system is analyzed, and it is proven that, selecting a suitable parameter in the augmented performance index, optimal control of the augmented affine system with the augmented performance index is equivalent to near-optimal control of the original system with the original performance index. Subsequently, based on the augmented affine system and the augmented performance index, integral reinforcement learning is extended to the completely unknown nonaffine nonlinear system, and it is further proven that, without the bounded assumption for the input dynamics, the signals in the closed-loop system are uniformly ultimately bounded. Furthermore, in the developed online nearly optimal parallel control method, the original performance index can be any arbitrary positive-definite function, and the convergent bound for the state vector is determined by a design parameter. 3. An event-triggered optimal parallel control method is developed for unknown discrete-time nonlinear systems. First, an augmented system and an augmented performance index are proposed to achieve parallel control. The control stability relationship between the augmented system and the original system is analyzed, and it is shown that, by choosing a proper augmented performance index, optimal control of the augmented system with the augmented performance index can be seen as near-optimal control of the original system with the original performance index. Subsequently, an event-triggered control framework is developed, and then a triggering condition is designed using the time-triggered optimal value function and control law. The control stability is proved under the designed triggering condition, and an upper bound is provided for the actual performance index in advance. Then, to implement the designed triggering condition for unknown discrete-time nonlinear systems, an online learning algorithm is developed without reconstructing unknown systems. The convergence of the signals in the closed-loop system is shown, and the assumption of boundedness of input dynamics is not needed. 4. An online optimal parallel tracking control method is developed for continuous-time nonaffine nonlinear systems. First, based on the dimensions of the original system state vector and the control input as well as the desired signal, an affine augmented system and an augmented desired signal are constructed. Then, the tracking control problem is converted to the regulation control problem by constructing an original tracking error system and an augmented tracking error system, and an augmented performance index is established according to the original performance index. Meanwhile, the stability relationship between the original tracking error system and the augmented tracking error system as well as the relationship between the original performance index and the augmented performance index are discussed theoretically. In addition, by using the ADP and neural network techniques, an online learning algorithm is designed to achieve online optimal tracking control, and the convergence of signals in the closed-loop system is analyzed. The developed method can be directly applied to nonaffine nonlinear systems without reconstructing the nonaffine system into the affine form, and the continuity of the control input can be guaranteed even if there are a finite number of jump discontinuities in the desired signal.
关键词	优化控制平行控制自适应动态规划强化学习非线性系统
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48706
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	卢经纬. 非线性系统自学习优化平行控制方法研究[D]. 北京. 中国科学院大学人工智能学院,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
学位论文-卢经纬.pdf（18672KB）	学位论文		限制开放	CC BY-NC-SA