自适应动态规划(Adaptive Dynamic Programming, ADP)结合了现代控制理论中的最优控制和自适应控制、计算智能中的人工神经网络以及机器学习中强化学习的思想，可以解决传统动态规划中的“维数灾难”问题，是一种具有学习和优化能力的智能控制方法，在求解连续时间复杂非线性系统的控制问题中具有极大的潜力。在当今社会生活和工业领域中存在着大量的复杂系统，这些实际系统通常具有未知的动态特性、高度的非线性和不确定性，难于建立机理模型，而传统的控制理论一般都依赖于精确的数学模型，致使其应用受到了很大限制。因此，研究连续时间ADP理论及其在复杂系统控制中的应用具有重要价值。本文的主要工作和贡献体现在以下三个方面。
By combining with optimal control, adaptive control, neural networks and reinforcement learning, adaptive dynamic programming (ADP) can be used to solve the problem of “curse of dimensionality” in the traditional dynamic programming. As a kind of intelligent control methods with learning and optimization capabilities, ADP has great potential in solving the control problems for continuous-time complex nonlinear systems. There are a large number of complex systems in daily life and industry. These real physical systems usually have unknown system dynamics, strong nonlinearities and uncertainties. Hence, it is difficult to establish accurate mathematical models. In traditional control theory, many methods depend on the accurate models and this restricts the implementation. Therefore, the study on the continuous-time ADP with its applications in control of complex systems has significant meaning. The main contributions of this thesis include the following three parts.
1. An augmented system is constructed with an augmented state which consists of the system state and the reference trajectory to solve the finite horizon optimal output tracking control. The theoretical results show the equivalence between the finite horizon optimal regulator control of the augmented system and the original problem. An online learning ADP algorithm based on policy iteration is developed to solve the optimal control policy in real-time with partially unknown system dynamics. The performance analysis of this algorithm is given. The implementation method using linear parameterized structures and the simulation example are also provided.
2. A data-based online learning ADP algorithm is developed for optimal control of weakly coupled nonlinear systems with completely unknown dynamics. According to the principle of optimality, the original system is reformulated into three decoupled and reduced-order subsystems. The approximate optimality of the control policy which is derived from the optimal control laws of the subsystems is analyzed. For each subsystem, a critic neural network and an action neural network are used to approximate its value function and control policy, respectively. The weights of the neural networks are updated synchronously. The least squares method is used to implement the algorithm and the simulation examples are provided.
3. A model-free integral policy iteration ADP algorithm is developed to solve the robust control of affine nonlinear systems and the decentralized control of nonlinear interconnected systems. This proposed method dose not require to identify the unknown dynamics, but only makes use of online measured data. The algorithm updates the value function and the control policy simultaneously. For the robust control problem, the robustness of the control policy by increasing a feedback gain to the optimal controller of the nominal system is theoretical analyzed and proved. For the decentralized control problem, the stability of the control policy by adding some local feedback gains to the optimal control laws of the isolated subsystems is theoretical analyzed and proved. Finally, the effectiveness of the proposed method is demonstrated in the control of multimachine power system.