二人零和动态博弈的自学习平行控制方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	二人零和动态博弈的自学习平行控制方法研究
	朱振华
	2023-11-30
页数	85页
学位类型	硕士
中文摘要	随着游戏智能、智能空战和自动驾驶等领域的发展，二人零和动态博弈问题的研究得到了广泛的关注。以模型预测控制和自适应动态规划为代表的经典工作往往假设系统的状态方程已知或者可以近似，当系统比较复杂难以精确建模时，仅使用上述方法难以处理。以人工系统+计算实验+平行执行为核心的平行控制是解决复杂系统建模、分析和控制的有效方法，本文研究一种基于自适应动态规划的自学习平行控制方法求解和分析二人零和动态博弈问题，本文的主要工作如下：（1）针对实际系统为时变非线性系统且无精确数学模型的二人零和动态博弈问题，本工作研究以实际系统和分时多人工系统为平行系统的自学习平行控制方法。本工作构造了分时多人工系统；分析了单个人工系统计算实验中的迭代值函数和迭代控制律的收敛性以及多人工系统值函数的收敛性；提出了判断人工系统上获得的控制律对实际系统是否有效的准则，并在该准则下分析实际系统性能指标函数的收敛性。（2）针对从实际系统当中获取状态数据困难且成本高并且存在与实际系统相对应的简化数学模型不完全精准的场景，本工作在上述工作的基础上，研究以实际系统、简化数学模型和分时多人工系统为平行系统的自学习平行控制方法。本工作提出了一个基于简化数学模型来构造人工系统的方法；分析了简化数学模型计算实验中迭代值函数和迭代控制律的收敛性；分析了人工系统计算实验中迭代值函数的收敛性以及多人工系统值函数的收敛性；提出了选择简化数学模型和人工系统中与实际系统进行平行执行的系统的准则，并在该准则下分析了实际系统的性能指标函数的收敛性。
英文摘要	With the development of game intelligence, intelligent air combat and autonomous driving, the research on two-player zero-sum dynamic game problems receive widespread attention. Classical works, represented by model predictive control and adaptive dynamic programming, often assume that the system functions are known or they can be modelled accurately. When the systems are relatively complex and difficult to accurately model, it is difficult to handle using only the above methods. Parallel control, based on artificial systems+computational experiments+parallel execution, is an effective method to model, analysis and control the complex system. This paper studies a self-learning parallel control method based on adaptive dynamic programming to solve and analyze two-player zero-sum dynamic game problems. The main works of this paper are as follows: (1). In this work, a self-learning parallel control method for two-player zero-sum dynamic game problems for real systems which are time-varying nonlinear systems and do not have accurate mathematical models is proposed, which takes real systems and multiple artificial systems as parallel systems. This work constructs multiple artificial systems; analyzes the convergence of iterative value functions and iterative control laws in the computing experiments for a single artificial system and the convergence of value functions for multiple artificial systems; proposes a criterion for determining whether the control laws obtained from the artificial systems are effective for real systems and analyzes the convergence of the performance index function for the real system under the criterion. (2). In view of the difficulty and high cost of obtaining state data from real systems and the incomplete accuracy of simplified mathematical models corresponding to real systems, this work, based on the above work, proposes a self-learning parallel control method using real systems, simplified mathematical models and multiple artificial systems as parallel systems. This work proposes a method for constructing artificial systems based on simplified mathematical models; analyzes the convergence of iterative value functions and iterative control laws in the computational experiments for simplified mathematical models and single artificial system and the convergence of value functions for multiple artificial system; proposes criteria for selecting systems between simplified mathematical models and artificial systems to perform parallel execution with the real systems and analyzes the convergence of the performance index function for the real systems under the criteria.
关键词	自适应动态规划平行控制零和博弈
语种	中文
七大方向——子方向分类	平行管理与控制
国重实验室规划方向分类	其他
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/54186
专题	毕业生_硕士学位论文中国科学院自动化研究所毕业生多模态人工智能系统全国重点实验室_复杂系统智能机理与平行控制团队
推荐引用方式 GB/T 7714	朱振华. 二人零和动态博弈的自学习平行控制方法研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
朱振华-硕士学位论文.pdf（1737KB）	学位论文		限制开放	CC BY-NC-SA