Knowledge Commons of Institute of Automation,CAS
Offline reinforcement learning with representations for actions | |
Lou, Xingzhou1,2; Yin, Qiyue1; Zhang, Junge1; Yu, Chao3; He, Zhaofeng4; Cheng, Nengjie5; Huang, Kaiqi1 | |
发表期刊 | INFORMATION SCIENCES |
ISSN | 0020-0255 |
2022-09-01 | |
卷号 | 610页码:746-758 |
通讯作者 | Zhang, Junge() |
摘要 | Prevailing offline reinforcement learning (RL) methods limit the policy within the area sup-ported by the offline dataset to avoid the distributional shift problem. But potential high -reward actions, which are out of the distribution of the dataset, are neglected in these meth-ods. To address such issue, we propose a new method, which generalizes from the offline dataset to out-of-distribution (OOD) actions. Specifically, we design a novel action embed-ding model to help infer the effect of actions. As a result, our value function reaches a better generalization over the action space, and further alleviate the distributional shift caused by overestimation of OOD actions. Theoretically, we give an information-theoretic explanation on the improvement of the value function's generalization over the action space. Experiments on D4RL demonstrate that our model improves the performance compared to previous offline RL methods, especially when the experience in the offline dataset is good. We conduct further study and validate that the value function's generalization on OOD actions is improved, which reinforces the effectiveness of our proposed action embedding model. (c) 2022 Published by Elsevier Inc. |
关键词 | Offline reinforcement learning Action embedding |
DOI | 10.1016/j.ins.2022.08.019 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Natural Science Foundation of China[61876181] ; Beijing Nova Program of Science and Technology[Z191100001119043] ; Youth Innovation Promotion Association, CAS |
项目资助者 | National Natural Science Foundation of China ; Beijing Nova Program of Science and Technology ; Youth Innovation Promotion Association, CAS |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Information Systems |
WOS记录号 | WOS:000860782400007 |
出版者 | ELSEVIER SCIENCE INC |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/50377 |
专题 | 智能系统与工程 |
通讯作者 | Zhang, Junge |
作者单位 | 1.Chinese Acad Sci, Inst Automat, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China 3.Sun Yat Sen Univ, Guangzhou, Peoples R China 4.Beijing Univ Posts & Telecommun, Beijing, Peoples R China 5.Nanchang Univ, Nanchang, Peoples R China |
第一作者单位 | 中国科学院自动化研究所 |
通讯作者单位 | 中国科学院自动化研究所 |
推荐引用方式 GB/T 7714 | Lou, Xingzhou,Yin, Qiyue,Zhang, Junge,et al. Offline reinforcement learning with representations for actions[J]. INFORMATION SCIENCES,2022,610:746-758. |
APA | Lou, Xingzhou.,Yin, Qiyue.,Zhang, Junge.,Yu, Chao.,He, Zhaofeng.,...&Huang, Kaiqi.(2022).Offline reinforcement learning with representations for actions.INFORMATION SCIENCES,610,746-758. |
MLA | Lou, Xingzhou,et al."Offline reinforcement learning with representations for actions".INFORMATION SCIENCES 610(2022):746-758. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论