CASIA OpenIR  > 毕业生  > 硕士学位论文
二人零和动态博弈的自学习平行控制方法研究
朱振华
2023-11-30
页数85页
学位类型硕士
中文摘要

随着游戏智能、智能空战和自动驾驶等领域的发展,二人零和动态博弈问题的研究得到了广泛的关注。以模型预测控制和自适应动态规划为代表的经典工作往往假设系统的状态方程已知或者可以近似,当系统比较复杂难以精确建模时,仅使用上述方法难以处理。以人工系统+计算实验+平行执行为核心的平行控制是解决复杂系统建模、分析和控制的有效方法,本文研究一种基于自适应动态规划的自学习平行控制方法求解和分析二人零和动态博弈问题,本文的主要工作如下:

(1)针对实际系统为时变非线性系统且无精确数学模型的二人零和动态博弈问题,本工作研究以实际系统和分时多人工系统为平行系统的自学习平行控制方法。本工作构造了分时多人工系统;分析了单个人工系统计算实验中的迭代值函数和迭代控制律的收敛性以及多人工系统值函数的收敛性;提出了判断人工系统上获得的控制律对实际系统是否有效的准则,并在该准则下分析实际系统性能指标函数的收敛性。

(2)针对从实际系统当中获取状态数据困难且成本高并且存在与实际系统相对应的简化数学模型不完全精准的场景,本工作在上述工作的基础上,研究以实际系统、简化数学模型和分时多人工系统为平行系统的自学习平行控制方法。本工作提出了一个基于简化数学模型来构造人工系统的方法;分析了简化数学模型计算实验中迭代值函数和迭代控制律的收敛性;分析了人工系统计算实验中迭代值函数的收敛性以及多人工系统值函数的收敛性;提出了选择简化数学模型和人工系统中与实际系统进行平行执行的系统的准则,并在该准则下分析了实际系统的性能指标函数的收敛性。

 

英文摘要

With the development of game intelligence, intelligent air combat and autonomous driving, the research on two-player zero-sum dynamic game problems receive widespread attention. Classical works, represented by model predictive control and adaptive dynamic programming, often assume that the system functions are known or they can be modelled accurately. When the systems are relatively complex and difficult to accurately model, it is difficult to handle using only the above methods. Parallel control, based on artificial systems+computational experiments+parallel execution, is an effective method to model, analysis and control the complex system. This paper studies a self-learning parallel control method based on adaptive dynamic programming to solve and analyze two-player zero-sum dynamic game problems. The main works of this paper are as follows:

(1). In this work, a self-learning parallel control method for two-player zero-sum dynamic game problems for real systems which are time-varying nonlinear systems and do not have accurate mathematical models is proposed, which takes real systems and multiple artificial systems as parallel systems. This work constructs multiple artificial systems; analyzes the convergence of iterative value functions and iterative control laws in the computing experiments for a single artificial system and the convergence of value functions for multiple artificial systems; proposes a criterion for determining whether the control laws obtained from the artificial systems are effective for real systems and analyzes the convergence of the performance index function for the real system under the criterion.

(2). In view of the difficulty and high cost of obtaining state data from real systems and the incomplete accuracy of simplified mathematical models corresponding to real systems, this work, based on the above work, proposes a self-learning parallel control method using real systems, simplified mathematical models and multiple artificial systems as parallel systems. This work proposes a method for constructing artificial systems based on simplified mathematical models; analyzes the convergence of iterative value functions and iterative control laws in the computational experiments for simplified mathematical models and single artificial system and the convergence of value functions for multiple artificial system; proposes criteria for selecting systems between simplified mathematical models and artificial systems to perform parallel execution with the real systems and analyzes the convergence of the performance index function for the real systems under the criteria.

关键词自适应动态规划 平行控制 零和博弈
语种中文
七大方向——子方向分类平行管理与控制
国重实验室规划方向分类其他
是否有论文关联数据集需要存交
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/54186
专题毕业生_硕士学位论文
中国科学院自动化研究所
毕业生
多模态人工智能系统全国重点实验室_复杂系统智能机理与平行控制团队
推荐引用方式
GB/T 7714
朱振华. 二人零和动态博弈的自学习平行控制方法研究[D],2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
朱振华-硕士学位论文.pdf(1737KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[朱振华]的文章
百度学术
百度学术中相似的文章
[朱振华]的文章
必应学术
必应学术中相似的文章
[朱振华]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。