嵌入结构先验的机器人技能学习算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	嵌入结构先验的机器人技能学习算法研究
	张丰一
	2022-11-24
页数	112
学位类型	博士
中文摘要	机器人技能学习作为机器人领域的一个重要研究课题，在智能制造、智慧城市、自动驾驶等领域有着广泛应用。在复杂、动态的环境中，机器人如何能够高效地自主学习以及获取运动、操作技能，是当前机器人技能学习技术亟待解决的关键问题之一。该技术的攻克将极大地提高机器人自主移动规划与灵巧操作的高效性、准确性与快速性，对工业生产水平提高、人民生活品质提升和国防建设智能化升级产生积极而深远的影响。近年来，随着机器人学和人工智能等多学科的快速发展，机器人技能学习方法在设计具备自主决策和学习能力的机器人系统任务中取得了巨大的性能突破，但依旧面临着学习效率低、鲁棒性差以及缺乏快速适应动态环境的能力等共性问题。先验知识是提高机器人技能学习效率的重要因素之一，将机器人系统中普遍存在的结构先验嵌入技能学习过程中是实现高效的机器人技能学习的一种有效途径。本文基于揭示的机器人结构先验知识对机器人任务进行学习，分别从机器人单任务学习、机器人多任务学习和不同结构机器人之间技能迁移学习三个方面对机器人的高效技能学习问题开展了研究，主要工作和创新点归纳如下： 1. 针对复杂机器人系统具有高自由度，单任务技能学习效率低的问题，提出了一种基于结构化策略模型的高效机器人技能学习算法。本文首先揭示了一种机器人物理结构上普遍存在的先验知识，并提出一种嵌入结构先验的机器人策略模型。不同于传统的策略模型，所提方法利用结构先验知识将机器人原始高维的联合状态空间分解成若干具有条件独立性的低维特征空间，在低维特征空间中进行最优策略搜索，从而降低策略搜索空间的维度。同时，所提方法采用多个线性模型分别输出各个关节对应采取的动作，进一步降低策略模型复杂度，提高样本数据利用效率，加快学习速度，且对状态空间中随机噪声具有较好的鲁棒性。实验结果表明，所提方法有效地提高了机器人单任务技能学习的效率、速度和鲁棒性。 2. 针对机器人在多任务场景下进行高效技能学习的需要，提出一种基于结构化无监督表示的机器人多任务学习算法。本文首先提出了一种结构化无监督表示学习模块，利用与具体任务无关的样本轨迹，以无监督的方式学习到结构化特征表示。该模块将机器人的原始高维状态空间投影到一个流形空间，生成一个结构化低维特征表示，该结构化特征表示能够表征机器人系统的内部模型。然后，提出了基于结构化无监督表示的机器人多任务学习算法，在机器人面临新任务时，结构化无监督表示模块无需重新训练，可直接用于新任务的策略学习过程，以实现新任务技能的快速获取。实验结果表明，本文所提出的方法能够高效地实现机器人多任务技能学习。 3. 针对不同结构机器人之间技能迁移学习效率低的问题，提出一种基于结构化先验的机器人可迁移技能学习算法。首先，本文提出一种加权聚合图神经网络策略模型。针对具体的机器人任务，设计符合机器人物理特性的加权信息聚合方法，以提高不同结构机器人之间技能迁移的学习效率和表现性能。进一步地，针对人工设计加权聚合方法存在的泛化性差的问题，提出一种基于图注意力机制的机器人策略模型，通过引入一种图注意力网络模块，利用成功的或随机的样本轨迹，以监督学习或强化学习的方式自动为不同位置的关节分配不同重要性的聚合权重，从而将符合机器人物理特性的先验知识嵌入策略模型中。实验结果表明，本文所提的基于图注意力机制的机器人策略模型提供了以学习的方式自主获得符合机器人物理特性的聚合权重方法，同时学习所得聚合权重分布具有较好的物理可解释性，所提的加权聚合图神经网络策略模型有效地提高了不同结构机器人之间技能迁移学习的效率。
英文摘要	Robot skill learning is an important research topic in the field of robotics. It has a wide range of applications in intelligent manufacturing, smart cities, autonomous driving and other fields. It is one of the key issues that how robots can efficiently learn and acquire motor and operation skills in some complex and dynamic environments for the current robot skill learning technology. The breakthrough of this technology will greatly improve the efficiency, accuracy and speed of the autonomous movement planning and dexterous operation, which has a positive and profound impact on the development of industrial production, the improvement of national life and the intelligent upgrade of national defense construction. In recent years, with the rapid development of multiple disciplines such as robotics and artificial intelligence, robot skill learning methods have achieved tremendous performance breakthroughs in the task of designing robot systems with autonomous decision-making and learning capabilities. However, current robot skill learning methods still face some common problems such as low efficiency of learning, poor robustness, and lack of ability to quickly adapt to dynamic environments. Prior knowledge is one of the important factors for improving the efficiency of robot skill learning. Incorporating the structural priors ubiquitous in robot systems into the process of skill learning is an effective way to achieve efficient robot skill learning. Based on the revealed structural prior knowledge of robots, we study the efficient skill learning for robots from three aspects: robot single-task learning, robot multi-task learning, and skill transfer learning between robots with different structures. The main work and innovations are summarized as follows: 1. Complex robot systems typically have high degrees of freedom, which inevitably results in low efficiency of single-task skill learning. Thus, this dissertation proposes an efficient robot skill learning algorithm based on a structural policy model. First, the ubiquitous prior knowledge on the physical structure of robots is revealed. Based on the revealed structural prior knowledge, a structural robot policy model is proposed. Different from the traditional policy model, the proposed method uses the structural prior knowledge to decompose the original high-dimensional joint state space of the robot into several low-dimensional feature spaces with conditional independence. The proposed structural policy model enables robots to perform optimal policy search in the low-dimensional feature spaces, thereby reducing the dimension of the policy search space. Meanwhile, the proposed method adopts multiple linear models to output the corresponding actions of each joint, which further reduces the complexity of the policy model, improves the efficiency of sample data utilization, accelerates the learning speed, and has superior robustness. The experimental results show that the proposed method effectively improves the efficiency, speed and robustness of single-task skill learning for robots. 2. To tackle the problem of low efficiency for robot skill learning in multi-task scenarios, an efficient robot multi-task learning algorithm based on structural unsupervised representation is proposed. First, a structural unsupervised representation learning module is designed, which utilizes task-agnostic sample trajectories to learn a structural feature representation in an unsupervised manner. This module projects the original high-dimensional state space of the robot into a manifold space, generating a structural low-dimensional feature representation that can characterize the internal model of the robot system. Then, a robot multi-task learning algorithm based on the structural unsupervised representation is proposed. When the robot faces a new task, the structural unsupervised representation module does not need to be trained afresh and can be directly used in the policy learning process of the new task. The experimental results demonstrate that the proposed method can effectively realize multi-task skill learning for robots. 3. To solve the problem of low efficiency for skill transfer learning between robots with different structures, this dissertation designs a transferable robot skill learning algorithm based on the structural prior knowledge. First, a weighted aggregation graph neural network policy model is proposed. For specific robot tasks, a weighted information aggregation method that conforms to the physical characteristics of robots is designed to improve the learning efficiency and performance of skill transfer learning between robots with different structures. Furthermore, to tackle the problem of poor generalization for the proposed weighted aggregation method, a robot policy model based on the graph attention mechanism is established. By exploiting a graph attention network module, the successful or random sample trajectories are utilized to automatically assign aggregation weights of different importance to joints at different positions in the supervised learning or reinforcement learning manner, so as to incorporate the prior knowledge that conforms to the physical characteristics of robots into the policy model. The experimental results show that the proposed robot policy model based on the graph attention mechanism provides a method of autonomously learning aggregation weights that conform to the physical characteristics of robots, and the learned aggregation weight distribution has good physical interpretability. The simulation results suggest that the proposed weighted aggregation graph neural network policy model effectively improves the efficiency of skill transfer learning between robots with different structures.
关键词	技能学习结构先验知识机器人状态表示学习强化学习
语种	中文
七大方向——子方向分类	智能机器人
国重实验室规划方向分类	其他
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/50839
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张丰一. 嵌入结构先验的机器人技能学习算法研究[D],2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（16472KB）	学位论文		限制开放	CC BY-NC-SA