Knowledge Commons of Institute of Automation,CAS
基于混合更新Q值的深度强化学习方法研究 | |
李主南 | |
2020-05-21 | |
页数 | 88 |
学位类型 | 硕士 |
中文摘要 | 近年来,随着算力和数据的爆发式增长,掀起了人工智能相关领域的研究与应用热潮,深度强化学习也因此成为了一个研究热点。在深度强化学习领域,不管是基于值的方法,还是基于策略梯度的方法,都会涉及到 Q 值的估计更新问题。目前,绝大部分方法都是利用 Q 学习方式来更新目标,然而这种方式会产生过估计问题。因此,有必要提出一种新的更新 Q 值方法来扩展现有方法。 |
英文摘要 | In recent years, with the explosion of computing power and data, it has set off a wave of research and application in the field of artificial intelligence, and deep reinforcement learning has therefore become a research hotspot. In the field of deep reinforcement learning, whether it is a value-based method or a gradient-based method, it will involve the problem of estimating the Q-value. At present, most methods use the way of Q-learning to update the target, but this method will result in overestimation problems. Therefore, it is necessary to propose a new method for updating the Q value to extend the existing method. Overestimation bias is a property of the well-known Q-learning algorithm, which could lead to suboptimal policies. Therefore, deep reinforcement learning methods that use the way of Q-learning to update the Q-value generally have this problem, including the Actor-Critic algorithm. This paper focuses on the problem of overestimation in reinforcement learning, and it’s goal is to propose a method to mitigate overestimation while effectively limit the negative effects of underestimation. We firstly analyzes the overestimation, and finds the main cause of the problem is the noise introduced by the function approximator. Secondly, the current method solves the overestimation problem and also introduces the underestimation bias, this paper is inspired by the concept of convex combination in the field of convex geometry, and proposes a mixing update method. And the method could reduce the variance and effectively improve the algorithm's performance, which is verified from theory and experiment respectively. Finally, we combine the method with three well-known deep reinforcement learning algorithms respectively: DQN, DDPG and TD3, and propose the corresponding improved algorithm and conducts some experiments on the Gym platform. The final experimental results show that the performance of the improved algorithm is better than the original algorithm in most cases, and validates the effectiveness of the proposed method once again. The research results of this paper mainly have two points. The one is a proposed method, the mixing update method is proposed inspired by convex geometry, and the effectiveness of the method is verified in theory and experiments. The other is to combine the method with three typical deep reinforcement learning algorithms, and propose three corresponding improved algorithms. We evaluate our improved algorithm on the suite of OpenAI gym tasks, most experimental results show that this method is an effective way to alleviate the overestimation problem. |
关键词 | 深度强化学习 Q 学习算法 过估计 欠估计 Actor-Critic 凸组合 混合更新 |
学科领域 | 人工智能 |
学科门类 | 工学 |
语种 | 中文 |
七大方向——子方向分类 | 强化与进化学习 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/39162 |
专题 | 复杂系统认知与决策实验室_智能系统与工程 |
推荐引用方式 GB/T 7714 | 李主南. 基于混合更新Q值的深度强化学习方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
李主南毕业论文.pdf(3839KB) | 学位论文 | 开放获取 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[李主南]的文章 |
百度学术 |
百度学术中相似的文章 |
[李主南]的文章 |
必应学术 |
必应学术中相似的文章 |
[李主南]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论