机器人操作技能学习：从模仿到自主

CASIA OpenIR > 毕业生 > 博士学位论文

	机器人操作技能学习：从模仿到自主
	刘乃军
	2020-08
页数	132
学位类型	博士
中文摘要	随着人工智能技术研究的快速发展及关键技术的突破，采用机器学习方法设计具备一定自主决策和学习能力的机器人操作技能学习系统，使机器人在复杂、动态的环境中学习并获取操作技能，将能弥补传统编程等常规方法不能动态适应非结构化工作环境的缺陷。本文针对机器人操作技能的模仿学习、虚拟环境与真实环境之间的迁移学习、生成对抗自主学习及多步复杂任务的课程学习等方面展开研究工作，论文主要内容如下：一、针对机器人操作技能学习对示教数据采集的需求，开发了基于虚拟现实的机器人示教系统。通过虚拟现实设备，主动端操作人员可实时观测到从动端机器人全方位工作场景。基于空间坐标位姿转化将主动端操作手柄的位姿变化映射为从动端机器人末端执行器的位姿变化，实现对机器人六维姿态精准跟随控制的目标。通过多种不同示教操作任务验证了该示教系统的有效性。二、针对机器人操作技能学习策略探索优化效率低问题，提出一种基于语义任务分割与策略学习相结合的深度模仿学习方法。构建长短时记忆神经网络，对机器人操作任务进行语义分割。融合模仿学习与强化学习探索的策略学习方法，在提高策略泛化性前提下，提升了策略探索优化的学习效率。通过机器人操作任务对提出的深度模仿学习方法进行了验证。三、针对真实环境中机器人操作技能学习面临的训练难题，提出一种“现实-虚拟-现实”的机器人操作技能学习迁移方法。通过真实场景图像语义分割、环境深度信息及坐标变化在线构建与任务相关的仿真环境，实现真实场景的等效虚拟化，并基于此仿真环境完成策略技能的自主学习。此外通过构建虚实环境下异构机器人之间的共同特征域空间，实现虚实环境下机器人操作技能直接迁移。四、针对机器人操作技能学习中示教数据获取困难问题，提出了一种基于无示教数据的生成对抗自主学习方法。基于 Hindsight 转化机制，将策略生成的失败数据转化为类专家数据，判别器对失败数据和类专家数据进行评判学习。策略基于判别器输出的奖赏信息实现无示教数据情形下的自主学习。通过不同种类机器人技能任务验证了该学习方法的有效性。五、针对机器人多步复杂操作任务学习的难题，提出了一种结合显式课程学习与隐式课程学习的“高-低”层架构课程学习方法。“高层”显式课程学习产生难度递增的辅助任务，并对辅助任务的学习难度进行调节。“低层”隐式课程学习对特定难度任务进行快速自主学习。融合显式课程学习与隐式课程学习方法，较好的处理了复杂操作技能因具有高维探索空间和稀疏奖励特性而学习困难的问题。通过机器人多步操作任务验证了该学习方法的有效性。
英文摘要	With the rapid development of artificial intelligence (AI) and the breakthrough of other related key technologies, using machine learning methods to design a learning system with autonomous decision-making ability, will enable robots to learn and acquire manipulation skills in complex and dynamic environments. Robot learning methods can make up for the defects of traditional programming methods and greatly improve the robot's ability to adapt to unstructured environments. This thesis focuses on methods of imitation learning, transfer learning between simulated and real-world environments, generative adversarial autonomous learning, and multi-step complex task learning. The main contents are as follows: Firstly, in view of the need for demonstration data in robot manipulation skill learning, a robot demonstration system based on virtual reality (VR) is designed. Through the virtual reality device, the active human operator can observe the working scene of the robot in real time. Based on the spatial coordinate transformation, the position and attitude change of the VR handle is mapped to the robot end-effector, which realizes the goal of accurate 6D control on the robot. The validity of the demonstration system is verified by a variety of different manipulation tasks. Secondly, aiming at improving the efficiency of policy exploration in robot manipulation skill learning, a deep imitation learning method based on semantic task segmentation and policy learning is presented. The manipulation task can be semantically segmented accurately with Long Short Term Memory (LSTM) neural network. The effective policy is trained by combing imitation learning with reinforcement learning methods, which also improves the convergence speed of the policy training. The proposed deep imitation learning method is validated by robot manipulation tasks. Thirdly, to address the difficulties in real-world robot manipulation skill learning, a "reality-virtual-reality" transfer learning method is proposed. Task-related simulated environments are built online through semantics segmentation, depth information, and coordinate transformation. Robot manipulation skill is learned in the equivalent simulation environment. Robot manipulation skill is directly transferred from the simulated environment to real word by constructing the common domain space between different robots. Fourthly, to solve the problem of demonstration data needed for robot manipulation skill learning, an autonomous learning method is presented without the need for demonstrations. Based on Hindsight transformation mechanism, the failed data generated by the policy is transformed into expert-like data, and the discriminator is trained by classifying failed data and expert-like data. The policy is trained autonomously based on the reward given by the discriminator. The validity of the proposed method is validated by different kinds of robot tasks. Finally, to learn manipulation skills on multi-step complex tasks, we propose a novel curriculum learning method endowing explicit and implicit curriculum learning with a high-low level structure. Explicit curriculum learning is at the high level, responsible for generating auxiliary training tasks in a meaningful order with increasing difficulty. Implicit curriculum learning is at the low level using a hindsight idea to learn skills for a certain difficult task. Experiments show that the combination of explicit and implicit curriculum learning can realize skill learning on complex multi-step robot manipulation tasks.
关键词	机器人操作技能学习模仿学习生成对抗自主学习课程学习
语种	中文
七大方向——子方向分类	智能机器人
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/40450
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	刘乃军. 机器人操作技能学习：从模仿到自主[D]. 中国科学院自动化研究所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis(2).pdf（18473KB）	学位论文		限制开放	CC BY-NC-SA