CASIA OpenIR  > 毕业生  > 博士学位论文
仿人灵巧手的功能性抓取学习
韦伟
2023-12
页数130
学位类型博士
中文摘要

灵巧操作能力是当前机器人的主要短板之一,提升机器人灵巧操作能力对于扩大机器人在现代工业、农业、服务业与医疗健康等领域的应用具有重要意义。抓取是机器人完成各类灵巧操作任务的基础,关注于实现对物体的准确操控,使机器人能够灵活适应不同环境与任务,进而提高机器人的自主操作效能。近年来,基于平行夹爪的机器人抓取取得了一定的进展,然而其结构简单,导致对物体的进一步操作十分困难。相比之下,具有众多驱动关节的仿人灵巧手具有接近人手的灵活性,具备实现更加复杂的抓取动作如功能性抓取的潜力。仿人灵巧手的功能性抓取旨在模仿人手具有特定操作意图的抓取行为,以高度类人的抓取方式抓取目标物体以便于完成后续操作任务,例如仿人灵巧手模仿人手使用各种日常生活工具等。

相比于常见的稳定抓取,仿人灵巧手功能性抓取需要根据特定目标物体的后续操作意图进行抓取规划,这对物体理解与建模、灵巧抓取姿势生成以及自主抓取动作规划等方面提出了更高要求。结合已有的相关研究,仿人灵巧手的功能性抓取仍存在以下问题:(1)仿人灵巧手抓取规划空间维度高,造成抓取规划困难、效率低及姿态受限等问题;(2)仿人灵巧手功能性抓取标注昂贵,功能性抓取研究的大规模数据集缺失,导致现有方法对未知物体泛化性差;(3)缺乏对人类操作意图的理解与自主动作规划能力,难以在实际场景应用部署。针对上述问题,本文利用深度生成模型技术,手-物交互约束与接触表示的方法,以及应用大语言模型的推理与规划技术实现仿人灵巧手的功能性抓取学习,提升功能性抓取性能。本文具体研究工作和创新贡献总结如下:

1. 针对高自由度仿人灵巧手抓取规划效率低、抓取姿态受限的问题,提出了一种基于手-物交互约束的多样化灵巧手抓取生成框架。本研究在该框架中首先提出了一个高效的灵巧手抓取标注方法,快速构建了一个高质量的大规模灵巧手抓取数据集。之后,利用该数据集训练了一个基于手-物交互约束的抓取生成网络。该网络能够首先由单视角点云输入重建物体完整点云,而后高效地预测手-物接触紧密的灵巧手抓取配置。仿真抓取实验显示,所提框架能够生成多样化的稳定灵巧手抓取姿势,分别在YCB(Yale-CMU-Berkeley)与EGAD(Evolved Grasping Analysis Dataset)物体数据集取得了超过75%与82%的抓取成功率。同样,在真实机器人的抓取实验显示,所提框架在单视角点云输入情况下,实现了对未知物体超过70%的抓取成功率。本研究解决了灵巧手自由度较高带来的基础性难题,为本文后续研究工作奠定了基础。

2. 针对仿人灵巧手功能性抓取标注成本高昂、数据集匮乏以及抓取规划效率低且效果不佳等问题,提出了基于精细手-物接触表示的灵巧手功能性抓取生成框架。本研究在该框架中首先提出了一个六步功能性抓取合成算法,通过建立类别级物体的稠密形状映射关系以及精细化的手-物接触表示,构建了一个支持5种灵巧手模型3种操作意图的大规模灵巧手功能性抓取数据集。之后,对基于手-物交互约束的抓取生成网络进行改进,提出了基于精细化手-物接触表示的功能性抓取生成网络。该网络能够融合类别级物体先验进行单视角物体重建,并依据操作意图生成高质量的功能性抓取配置。仿真抓取实验表明,融合类别级物体先验的单视角物体重建模块能够实现更加鲁棒与精细的物体重建。同时,所提框架在抓取稳定性及功能性等多项评估指标上显著优于当前先进方法,能够针对工具使用、交递物品和拾取物品三种操作意图生成高度类人的功能性抓取姿势。同样,在真实机器人的抓取实验显示,所提框架在三种意图的功能性抓取中针对未知物体分别取得了69%、71%、83%的抓取成功率。

3. 针对仿人灵巧手在实际场景功能性抓取过程中需要进行人类操作意图理解以及自主抓取动作规划的问题,提出了一个基于操作意图理解的多任务灵巧手功能性抓取框架。该框架包含了一个高级的任务理解层以及一个低级的任务动作执行层。首先,高级的任务理解层负责由人类的指令与视觉输入预测人类的操作意图并进行多阶段长序操作任务的动作分解。而后,低级的任务动作执行层以操作意图及抓取动作语言描述为输入条件,预测灵巧手手腕的关键位姿及灵巧手抓取状态。仿真抓取实验表明,所提框架能够准确推断出多种功能性抓取操作意图,并能够为多个任务规划出合理的运动轨迹和抓取动作。相比于当前的先进方法,所提框架在多个功能性抓取任务上提升明显。同样,真实机器人实验显示,所提框架相比于当前的先进方法能够更为精确地预测灵巧手手腕关键位姿,同时具有一定的抗干扰能力,并具备3Hz的推理速度。

本文围绕仿人灵巧手的功能性抓取学习展开了系统性研究,为后续仿人灵巧手在真实机器人场景下的自主功能性抓取与人机交互提供了理论指导与技术支撑。

英文摘要

Dexterous manipulation remains a major challenge in current robotics. Enhancing robotic dexterity is crucial for expanding robot applications in modern industries, agriculture, services, and healthcare. Grasping is fundamental to robotic dexterous manipulation, focusing on precise object manipulation, enabling robots to flexibly adapt to diverse environments and tasks, ultimately improving their autonomy. In recent years, progress has been made in robotic grasping, particularly with parallel grippers. However, their simple structure poses challenges for further object manipulation. In contrast, anthropomorphic dexterous hands, with numerous articulated joints, offer human-like flexibility and the potential to execute more complex grasping actions, including functional grasping. Functional grasping with anthropomorphic hands aims to emulate human hand actions with specific operational intents. This allows robots to grasp target objects in a highly human-like manner in order to facilitate subsequent manipulation tasks. For example, anthropomorphic dexterous hands can mimic human hands in using various everyday tools.

Compared to common stable grasping, achieving functional grasping with anthropomorphic dexterous hands requires grasp planning based on the specific post-grasping operational intent of the target object. This imposes higher demands on aspects such as object understanding and modeling, dexterous grasp pose generation, and autonomous grasping action planning. In light of existing research, functional grasping with anthropomorphic dexterous hands still faces the following challenges: (1) the high-dimensional grasp planning space of anthropomorphic dexterous hands results in difficulties, low efficiency, and pose limitations in grasp planning; (2) annotating functional grasps for anthropomorphic dexterous hands is expensive, and there is a lack of large-scale datasets for functional grasp research, leading to poor generalization to unknown objects in current methods; (3) there is a deficiency in understanding human operational intent and the ability to autonomously plan actions, making deployment in real-world scenarios challenging. To address these issues, this thesis utilizes deep generative model technologies, methods involving hand-object interaction constraints and contact representations, and the application of large language models for reasoning and planning to achieve learning for functional grasping with anthropomorphic dexterous hands, thereby enhancing the performance of functional grasping. The specific research contributions and innovations of this thesis are summarized below:

1. In addressing challenges of low planning efficiency and constrained grasp postures in the context of high degree-of-freedom dexterous anthropomorphic hands, a diversified framework for generating dexterous hand grasps is proposed. This framework is based on hand-object interaction constraints. Firstly, an efficient annotation method for dexterous hand grasps is introduced, facilitating the rapid construction of a high-quality, large-scale dexterous hand grasp dataset. Subsequently, a grasp generation network is trained using this dataset, incorporating hand-object interaction constraints. The network is capable of first reconstructing complete object point cloud from single-view point cloud input and then efficiently predicting stable dexterous hand grasps with tight hand-object contact. Simulation grasp experiments demonstrate the proposed framework's ability to generate diverse and stable dexterous hand grasp poses, achieving over 75% and 82% grasp success rates on the YCB (Yale-CMU-Berkeley) and EGAD (Evolved Grasping Analysis Dataset) object datasets, respectively. Similarly, real robot grasp experiments show that the proposed framework achieves over a 70% grasp success rate for unknown objects under single-view point cloud inputs. This study addresses foundational challenges arising from the high degrees of freedom in dexterous hands, laying the groundwork for subsequent research in this thesis.
    
2. In addressing challenges of high cost in annotating functional grasps for anthropomorphic dexterous hands, data scarcity, and inefficiency in functional grasp planning, a dexterous hand functional grasp generation framework based on fine-grained hand-object contact representation is proposed. The framework introduces a six-step algorithm for synthesizing functional grasps. By leveraging dense shape correspondences for category-level objects and the fine-grained hand-object contact representation, a large-scale dataset supporting five dexterous hand models and three operational intents is constructed. The grasp generation network, based on hand-object interaction constraints, is improved with a focus on fine-grained hand-object contact representation. This network integrates category-level object priors for single-view object reconstruction and generates high-quality functional grasps based on operational intents. Simulation grasp experiments demonstrate that the proposed framework achieves robust and fine-grained object reconstruction through the integration of category-level object priors. Additionally, it significantly outperforms state-of-the-art methods in various evaluation metrics related to grasp stability and functionality. The method can generate highly anthropomorphic functional grasp poses for three operational intents: tool-use, handover, and pickup. Real robot grasp experiments reveal success rates of 69%, 71%, and 83% for unknown objects in functional grasping across the three operational intents.
    
3. In addressing challenges encountered in the practical scenario of functional grasping with anthropomorphic dexterous hands, where human intent understanding and autonomous grasp action planning are required, a multi-task functional grasping framework based on intent understanding is proposed. The framework comprises a high-level task understanding layer and a low-level task action execution layer. Firstly, the high-level task understanding layer is responsible for predicting human intentions from human instructions and visual inputs, and decomposing multi-stage complex tasks. Subsequently, the low-level task action execution layer, given intent and grasp action language descriptions, predicts key wrist poses and dexterous hand grasp states. Simulation grasp experiments demonstrate that the proposed framework accurately infers various intentions for functional grasping, planning reasonable motion trajectories and grasp actions for multiple tasks. Compared to current start-of-the-art methods, the proposed framework demonstrates a significant improvement across multiple functional grasping tasks. Similarly, real-world robot experiments indicate that the proposed framework exhibits superior accuracy in predicting wrist key poses for dexterous hand, possesses certain anti-interference capabilities, and maintains an inference speed of 3Hz.

This thesis conducts a systematic study on the functional grasping learning of anthropomorphic dexterous hands, providing theoretical guidance and technical support for the subsequent autonomous functional grasping and human-machine interaction of anthropomorphic dexterous hands in real-world robotic scenarios.

关键词机器人学习 仿人灵巧手 功能性抓取 手-物交互 单视角物体重建
收录类别其他
语种中文
七大方向——子方向分类智能机器人
国重实验室规划方向分类视觉信息处理
是否有论文关联数据集需要存交
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/54584
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
韦伟. 仿人灵巧手的功能性抓取学习[D],2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
毕业论文-韦伟-仿人灵巧手的功能性抓取学(32678KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[韦伟]的文章
百度学术
百度学术中相似的文章
[韦伟]的文章
必应学术
必应学术中相似的文章
[韦伟]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。