等周约束最优控制问题及其逆问题研究

CASIA OpenIR > 毕业生 > 博士学位论文

	等周约束最优控制问题及其逆问题研究
	李涛
	2024-05-12
页数	158
学位类型	博士
中文摘要	在最优控制理论中，等周约束指的是使代价函数小于给定上界的约束。等周约束最优控制的研究旨在求解最小化一个代价函数，同时保证其他代价函数小于给定上界的最优控制律。等周约束的引入为描述复杂的控制目标提供了极大的便利，因而在工程应用中十分常见。然而，相较于经典的控制约束最优控制问题，等周约束是关于整个时域上的状态量和控制量的总和的约束，因而它破坏了最优控制问题的无后效性，阻碍了动态规划方法的顺利应用，增大了问题的求解难度。因此，对于等周约束最优控制问题的研究具有极大的应用潜力和重要的理论价值。针对实际系统，在建立最优控制问题时，其中的约束条件往往隐含在专家经验中，且严重依赖于专家的手工设计。另一方面，等周约束具有应用广泛、易于参数化、表达能力强的特点，这使其在提取潜在复杂约束方面具有强大的潜力。等周约束逆最优控制旨在根据给定的最优状态和控制轨迹，将潜在复杂约束自动提取为等周约束的形式，从而帮助专家设计最优控制问题中的约束条件，减少人力消耗。等周约束逆最优控制具有广泛的应用前景，不仅有助于等周约束最优控制问题的建立，而且有助于分析并改善使系统性能降低的约束。本文以自适应动态规划理论为基础，研究等周约束最优控制问题及其逆问题的求解方法。等周约束最优控制问题上的研究成果有助于对其逆问题的研究，同时，等周约束逆最优控制问题的研究结果也便于正向最优控制问题的建立。本文的主要工作和创新点归纳如下： 1. 针对离散时间非线性系统在无穷时域上的等周约束最优控制问题，提出了等周约束值迭代方法。在已知某个可行控制律的条件下，构造非线性控制约束条件，从而使用带控制约束的最优控制问题近似原等周约束最优控制问题，并估计近似前后的误差。在此基础上，提出了等周约束值迭代算法求解转化后的最优控制问题，证明了迭代值函数收敛至等周约束Bellman方程的解，并证明了迭代控制律的可行性。进一步地，给出了基于神经网络的算法实施方案，并在考虑神经网络近似误差的条件下证明了算法的收敛性和近似控制律的可行性。 2. 针对离散时间非线性系统在无穷时域上的等周约束最优控制问题，提出了等周约束多步前瞻策略迭代方法。在等周约束值迭代算法的基础上，放松已知可行初始控制律的条件，转而通过构造辅助函数将等周约束最优控制问题转化为控制约束最优控制问题，并给出辅助函数的构造方法。在此基础上，提出等周约束多步前瞻策略迭代方法近似求解等周约束最优控制问题，并证明算法的收敛性和最优性。进一步地，给出了基于函数近似的算法实施方案，并在考虑近似误差的条件下分析了算法的误差界，从而给出了保证迭代值函数收敛至最优值函数的有限邻域内的充分条件。 3. 针对连续时间非线性系统在有限时域上的等周约束最优控制问题，提出了原-对偶自适应动态规划方法，将非线性等周约束最优控制问题拆分为一系列可由原-对偶方法求解的线性二次时变等周约束最优控制问题，并证明了原-对偶自适应动态规划方法的收敛性。进一步地，基于Pontryagin原理推导了最优性必要条件，并证明了迭代极限值满足最优性必要条件，从而分析了原-对偶自适应动态规划方法的最优性。 4. 针对离散时间非线性等周约束逆最优控制问题，提出了等周约束推理方法。通过推导等周约束最优控制问题的最优性必要条件并参数化约束函数，基于给定轨迹数据建立了关于未知等周约束的非线性恢复方程组。在此基础上，给出恢复方程组解的唯一性成立的充分条件。当该非线性方程组具有唯一解时，给出解析的求解公式，否则通过求解带约束优化问题推理约束。进一步地，将该逆最优控制方法扩展到终端状态不受约束的场景和给定多条示例轨迹的场景中。
英文摘要	In optimal control theory, isoperimetric constraints refer to constraints that make cost functions less than given upper bounds. The research on optimal control problems with isoperimetric constraints aims to solve the optimal control law that minimizes one cost function while guaranteeing other cost functions to be less than certain upper bounds. The introduction of isoperimetric constraints provides convenience in describing complex control objectives, hence they are common in engineering applications. However, compared with classical control constrained optimal control problems, isoperimetric constraints are constraints on the summation of state and control vectors over the whole time horizon. Thus, isoperimetric constraints violate the non-aftereffect property of the optimal control problem, hinder the application of the dynamic programming method, and increase the difficulty of the problem. Therefore, the research on optimal control problems with isoperimetric constraints has huge application potential and important theoretical value. For real-world systems, when establishing optimal control problems, constraints are inherent to experts' own experience and rely heavily on manual design. On the other hand, isoperimetric constraints have the characteristics of wide application, easy parameterization, and strong expressive ability, which shows that they have a strong potential in exacting latent complex constraints. Inverse optimal control with isoperimetric constraints aims to automatically extract latent complex constraints into the form of isoperimetric constraints given optimal state and control trajectories, thereby helping experts to design constraints in optimal control problems and reducing manpower consumption. Inverse optimal control with isoperimetric constraints has wide application prospects, it not only helps to establish optimal control problems with isoperimetric constraints, but also helps to analyze and improve the constraints that reduce system performance. This thesis studies the method to solve optimal control problems with isoperimetric constraints and their inverse problems based on adaptive dynamic programming theory. The results on optimal control problems with isoperimetric constraints are helpful to the research on inverse problems. At the same time, the results on inverse optimal control problem with isoperimetric constraints are helpful to the establishment of the forward problems. The main contents and contributions of this thesis are summarized as follows: 1. A value iteration with isoperimetric constraints method is proposed for infinite horizon optimal control problems with isoperimetric constraints of discrete-time nonlinear systems. Under the condition that a certain feasible control law is known, by constructing a new nonlinear control constraint, the optimal control problem with isoperimetric constraints is approximated by the optimal control problem with control constraints, and the approximation error is analyzed. Then, the value iteration with isoperimetric constraints method is proposed to solve the transformed optimal control problem. It is proven that the iterative value function converges to the solution of the Bellman equation with isoperimetric constraints. The feasibility of the iterative control law is also proven. Furthermore, the implementation based on neural networks is introduced, and the convergence of the algorithm and the feasibility of the approximate optimal control law are proved considering the approximation error of neural networks. 2. A multi-step look-ahead policy iteration with isoperimetric constraints method is proposed for infinite horizon optimal control problems with isoperimetric constraints of discrete-time nonlinear systems. Based on the value iteration with isoperimetric constraints method, the condition of the known feasible initial control law is relaxed, and the optimal control problem with isoperimetric constraints is transformed into the optimal control problem with control constraints by constructing an auxiliary function. The method to construct an appropriate auxiliary function is also given. Then, the multi-step look-ahead policy iteration with isoperimetric constraints method is proposed to approximately solve the optimal control problem with isoperimetric constraints, and the convergence and optimality of the algorithm are proven. Furthermore, the implementation based on function approximators is described. In addition, the error bound is analyzed considering the approximation error, thus providing sufficient conditions to guarantee that the iterative value function converges to the limited neighborhood of the optimal value function. 3. A primal-dual adaptive dynamic programming method is proposed for finite-horizon optimal control problems with isoperimetric constraints of continuous-time nonlinear systems. The optimal control problem with isoperimetric constraints is approximated by a series of linear quadratic time-varying optimal control problems with isoperimetric quadratic constraints, which are solved by the primal-dual method. The convergence of the primal-dual adaptive dynamic programming method is proven. Furthermore, the optimality is analyzed by deriving the necessary optimality conditions via Pontryagin's principle and proving that the limiting values of the iterations satisfy the necessary optimality conditions. 4. A isoperimetric constraint inference method is proposed for inverse optimal control problems with isoperimetric constraints. By deriving the optimality necessary conditions for optimal control problems with isoperimetric constraints and parameterizing the constraint function, the nonlinear recovery equations for unknown isoperimetric constraints are established based on the given trajectory data. In addition, the sufficient conditions to guarantee the uniqueness of the solution to the recovery equations are given. When the recovery equations has an unique solution, the solution is given analytically. Otherwise, the constraints are inferred by solving a constrained optimization problem. Furthermore, the proposed method is extended to unconstrained terminal state and multiple trajectory settings.
关键词	最优控制逆最优控制等周约束自适应动态规划智能控制
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/56621
专题	毕业生_博士学位论文
通讯作者	李涛
推荐引用方式 GB/T 7714	李涛. 等周约束最优控制问题及其逆问题研究[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
学位论文终稿_等周约束最优控制问题及其逆（2814KB）	学位论文		限制开放	CC BY-NC-SA