智能机器人共享控制与操作技能学习方法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 智能机器人系统研究

	智能机器人共享控制与操作技能学习方法研究
	席宝
	2020-12-02
页数	113
学位类型	博士
中文摘要	随着机器人应用从传统的工业领域向服务领域不断扩展，机器人研究面临在非结构环境下完成灵活多变操作任务的巨大挑战。因此，提高机器人的操作效率以及利用示教操作数据加速机器人操作技能学习成为机器人领域的研究热点。本文围绕上述问题，从用户遥操作位姿检测、共享控制遥操作和操作技能学习等方面开展研究工作。论文主要研究内容如下：一、针对遥操作中用户操作位姿检测问题，提出一种基于检测-跟踪框架的位姿检测方法。用户采用定制的多颜色标记操作手柄进行遥操作，我们采用支持向量数据描述算法对操作手柄进行检测和定位，利用卡尔曼滤波器跟踪和预测标记位置，再根据标记位置计算手柄位姿。实验结果显示所提方法可有效检测用户动作输入，具有较好的实时性。二、针对用户采集示教数据过程中遥操作效率低的问题，提出一种基于任务参数隐半马尔科夫模型的共享控制方法。该方法利用任务参数隐半马尔科夫模型对示教操作数据编码得到操作运动模型，再利用该模型预测用户的操作目标和机器人的期望位置，并通过策略混合为用户提供辅助进而实现共享控制。实验结果显示该方法可以提高遥操作效率。三、针对操作技能学习中策略训练效率低的问题，提出基于策略梯度的强化学习方法。该方法将零步和一步估计相结合用于计算目标Q值以降低估计的方差，将最大化多个Q函数下界作为策略优化目标以降低Q函数中过估计偏差对策略训练的影响，将确定性策略梯度与随机策略梯度相结合以提高随机策略训练效率。实验结果表明该算法能够提高策略训练速度和表现性能。四、针对稀疏奖赏操作技能学习中难以获得有效学习信号的问题，引入示教轨迹数据，提出一种基于示教轨迹分布引导的强化学习方法。该方法根据示教轨迹概率分布为示教数据分配权重并在训练中根据策略性能动态调节模仿权重，使示教数据既能引导训练又不妨碍策略的进一步优化。仿真和实验结果显示该方法能够提高学习效率和策略性能。
英文摘要	With the ever growing applications of robot, robot research is facing huge challenges of completing flexible manipulation tasks in unstructured environment. Therefore, improving the efficiency of human-robot interaction and making use of demonstrations from human experts to accelerate manipulation skill learning have become research hot topics. In this thesis, we focus on the detection of user operation poses, shared control teleoperation and robotic manipulation skill learning. The main research contents are as follows: Firstly, as for the problem of user operation pose detection in teleoperation, a pose detection method based on detection-tracking framework is proposed. User teleoperates a robot by a specifically designed multi-color-marker handle. We use support vector data description classifiers to detect and locate the handle, and use a Kalman filter to track and predict the position of markers. The pose of the handle is estimated according to the location of color markers. Experimental results show that the proposed method can detect user operation poses in real time. Secondly, aiming at the low efficiency of teleoperation when collecting demonstration data, a shared control method based on task-parameterized hidden semi-Markov model is proposed. In this method, the task-parameterized hidden semi-Markov model is used to encode the demonstration data to obtain the manipulation motion model. Then the model is used to predict the user's operation target and the expected position of the robot. Finally, the predicted position is combined with the user input to provide assistance for the user to achieve the shared control. Experiments show that this method can improve the efficiency of teleoperation. Thirdly, a reinforcement learning method based on policy gradient is proposed to improve the efficiency of policy training in manipulation skill learning. In this method, the zero-step and one-step estimation are combined to get the target Q value to reduce the estimation variance. The objective of policy optimization is to maximize the lower bounds of multiple Q-functions to reduce the influence of overestimation bias of Q-functions on policy training. The deterministic and stochastic policy gradient are combined to improve the training efficiency of a stochastic policy, and the recent experienced data are emphasized in the training. Experimental results show that this method can improve the training speed and policy performance. Fourthly, to address the difficulty of obtaining effective learning signals for manipulation skill learning tasks with sparse reward, a demonstration trajectory probability distribution guiding reinforcement learning method is proposed. In this method, the demonstration data is weighted by the probability distribution of demonstration trajectories and the imitation weight is adjusted dynamically according to the policy performance during the training. Therefore, the demonstration data is able to guide the training without hindering the further optimization of the policy. Both simulation and physical experiment results show that this method can improve the learning efficiency and policy performance.
关键词	位姿检测共享控制强化学习策略梯度示教引导
语种	中文
七大方向——子方向分类	智能机器人
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/42217
专题	多模态人工智能系统全国重点实验室_智能机器人系统研究
推荐引用方式 GB/T 7714	席宝. 智能机器人共享控制与操作技能学习方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
智能机器人共享控制与操作技能学习方法研究（9051KB）	学位论文		开放获取	CC BY-NC-SA