Knowledge Commons of Institute of Automation,CAS
博弈对抗环境中智能策略研究 | |
唐振韬![]() | |
2021-05-17 | |
页数 | 160 |
学位类型 | 博士 |
中文摘要 | 策略博弈是反映人工智能“智能化”水平的重要体现,一直受到研究人员的广泛关注。博弈过程需要对当前状态进行态势评估,依据态势评估信息推演出的可能性收益来做决策。作为当下主流的两类通用人工智能决策规划算法:深度强化学习和统计前向规划算法,已经在游戏人工智能领域中取得了令人瞩目的研究成果。深度强化学习方法融合了深度学习的感知能力和强化学习的决策能力,以最大化环境奖赏信号作为优化目标,实现端到端方式的决策模型更新。统计前向规划算法则是融合人类启发式先验知识构建前向模型,基于前向模型在推理环境中自适应探索并规划出高价值的动作序列作为博弈决策。为有效利用二者优势,基于深度强化学习与统计前向规划方法,研究博弈对抗环境中智能策略方法和实现技术,以进一步提高博弈策略模型表现,对于提升机器博弈性能,推动智能决策技术在专业领域的应用,具有重要的理论意义和应用价值。 |
英文摘要 | The strategy game is an important embodiment of reflecting the level of artificial intelligence, which has been widely concerned by researchers. In the process of the game, we have to evaluate the current state and make decisions according to the possible benefits derived from the situation. Two main kinds of Artificial General Intelligent decision and planning algorithms, deep reinforcement learning and statistical forward planning algorithms, have achieved remarkable achievements in the field of game artificial intelligence. Deep reinforcement learning methods combine the perception ability of deep learning and the decision-making capacity of reinforcement learning, and take maximizing the environmental reward signal as the optimization objective to realize the end-to-end decision-making model updating. Statistical forward planning algorithms integrate human heuristic prior knowledge to construct a forward model, and plan with a forward model. It adaptively explores and plans high-value action sequences as the game decision. Therefore, based on deep reinforcement learning and statistical forward planning, it is of great theoretical significance and application value to study the intelligent strategy approaches and implementation technology in adversarial games, to further improve the performance of game strategy model, and promote the application of intelligent decision technology in professional fields. This thesis takes the intelligent strategy approaches in adversarial games as the research objective. According to the research process from the complete information round game to the imperfect information real-time game, this thesis focuses on deep reinforcement learning and statistical forward planning methods. First, for the complete information turn-based game, the self-play confrontation model is studied based on the Gomoku platform. Then, for the real-time strategy game, the end-to-end macro 1. A strategy game model based on reinforcement learning and Monte Carlo tree search. 2. An end-to-end real-time strategy game neural network model based on the hybrid of policy and value. 3. An enhanced rolling horizon evolution algorithm with the adaptive opponent model. 4. A multi robot strategy game model based on replay memory of opponent strategy and curiosity driven. |
关键词 | 深度强化学习 统计前向规划 策略博弈 智能决策 游戏人工智能 |
语种 | 中文 |
七大方向——子方向分类 | 机器博弈 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/45058 |
专题 | 多模态人工智能系统全国重点实验室_深度强化学习 |
推荐引用方式 GB/T 7714 | 唐振韬. 博弈对抗环境中智能策略研究[D]. 北京. 中国科学院自动化研究所,2021. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
博弈对抗环境中智能策略研究-唐振韬博士学(23513KB) | 学位论文 | 开放获取 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[唐振韬]的文章 |
百度学术 |
百度学术中相似的文章 |
[唐振韬]的文章 |
必应学术 |
必应学术中相似的文章 |
[唐振韬]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论