面向连续控制任务的深度强化学习值函数估计研究 学位论文
工学硕士, 中国科学院自动化研究所: 中国科学院大学, 2022
Authors:  何强
Adobe PDF(4687Kb)  |  Favorite  |  View/Download:149/5  |  Submit date:2022/06/17
深度强化学习  值函数估计  值函数表示  集成强化学习  
POPO: Pessimistic Offline Policy Optimization 会议论文
, Singapore, Singapore, 23-27 May 2022
Authors:  He Q(何强);  Hou XW(侯新文);  Liu Y(刘禹)
Adobe PDF(1200Kb)  |  Favorite  |  View/Download:103/16  |  Submit date:2022/06/27
reinforcement learning  offline optimization  out-of-distribution  
Wide-Sense Stationary Policy Optimization with Bellman Residual on Video Games 会议论文
, Shenzhen, China, 05-09 July 2021
Authors:  Gong C(龚晨);  He Q(何强);  Bai YP(白云鹏);  Hou XW(侯新文);  Fan GL(范国梁);  Liu Y(刘禹)
Adobe PDF(2780Kb)  |  Favorite  |  View/Download:127/14  |  Submit date:2022/06/27
Video Game  Reinforcement Learning  Quantile Regression  Bellman residual  Wasserstein Distance  
Wd3: Taming the estimation bias in deep reinforcement learning 会议论文
, Baltimore, MD, USA, 2020-12
Authors:  He Q(何强);  Hou XW(侯新文)
Adobe PDF(2006Kb)  |  Favorite  |  View/Download:94/11  |  Submit date:2022/06/27
deep reinforcement learning  estimation bias  neural networks