Selected(0)Clear
Items/Page: Sort: |
| 面向连续控制任务的深度强化学习值函数估计研究 学位论文 工学硕士, 中国科学院自动化研究所: 中国科学院大学, 2022 Authors: 何强
Adobe PDF(4687Kb)  |   Favorite  |  View/Download:149/5  |  Submit date:2022/06/17 深度强化学习 值函数估计 值函数表示 集成强化学习 |
| POPO: Pessimistic Offline Policy Optimization 会议论文 , Singapore, Singapore, 23-27 May 2022 Authors: He Q(何强) ; Hou XW(侯新文) ; Liu Y(刘禹)
Adobe PDF(1200Kb)  |   Favorite  |  View/Download:103/16  |  Submit date:2022/06/27 reinforcement learning offline optimization out-of-distribution |
| Wide-Sense Stationary Policy Optimization with Bellman Residual on Video Games 会议论文 , Shenzhen, China, 05-09 July 2021 Authors: Gong C(龚晨) ; He Q(何强) ; Bai YP(白云鹏) ; Hou XW(侯新文) ; Fan GL(范国梁) ; Liu Y(刘禹)
Adobe PDF(2780Kb)  |   Favorite  |  View/Download:127/14  |  Submit date:2022/06/27 Video Game Reinforcement Learning Quantile Regression Bellman residual Wasserstein Distance |
| Wd3: Taming the estimation bias in deep reinforcement learning 会议论文 , Baltimore, MD, USA, 2020-12 Authors: He Q(何强) ; Hou XW(侯新文)
Adobe PDF(2006Kb)  |   Favorite  |  View/Download:94/11  |  Submit date:2022/06/27 deep reinforcement learning estimation bias neural networks |