基于视觉的服务机器人智能抓取研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于视觉的服务机器人智能抓取研究
	耿文杰
	2023-05-17
页数	132
学位类型	博士
中文摘要	服务机器人良好的抓取操作能力是其提供优质服务的重要前提。现实抓取环境中经常存在背景干扰、其它物体干扰等问题，在这些复杂情形下实现机器人高质量的智能抓取，具有重要的理论研究意义和广泛应用前景。本文针对基于视觉的服务机器人智能抓取开展研究，论文的主要内容如下：首先，介绍了服务机器人智能抓取的研究背景和研究意义，从物体检测与分割、机器人操作任务规划和机器人抓取检测三个方面进行了现状综述，并对论文内容和结构做了介绍。其次，提出了一种基于自适应颈网络和空洞残差结构的实例分割方法。通过骨干网络对输入图像提取特征后，针对现有特征金字塔颈网络以逐层传递的方式融合特征，带来传输效率较低、特征挖掘能力弱的问题，设计了基于自上而下与自下而上双向融合的自适应颈网络，进行跨层特征传输和多尺度信息融合，并自适应地学习多尺度特征的权重以充分挖掘特征。而后，自适应颈网络输出的特征图通过物体检测分支和掩模原型分支获取实例分割结果。在掩模原型分支中，嵌入了空洞残差结构以捕获更多不同感受野的信息，克服了现有掩模原型分支因缺乏较好的上下文信息挖掘造成实例分割特别是掩模质量降低的问题。在数据集和实际场景实验表明了所提方法的有效性。第三，在目标物体受到干扰不能被直接抓取情形下，针对现有方法基于矩形或圆形禁区构建势场函数进而计算干扰物体的搬移落点难以有效利用操作空间的问题，本文提出了基于椭圆锥势场的机器人搬移方法。以最小椭圆包络描述禁区从而全面地表征物体的位置、尺寸、朝向等要素，并设计椭圆锥函数反映物体对环境的影响以同离心率向外辐射时的作用规律，进而构造椭圆锥引力势场和椭圆锥斥力势场描述不同角色的物体对环境的影响，其中，当前待抓取物体和其他物体分别产生引力势场和斥力势场。进一步，利用叠加后的合势场确定当前待抓取物体的搬移落点。实际搬移实验证明了所提方法的有效性。第四，针对现有操作任务规划方法在干扰物体较多时，生成操作序列效率较低的问题，提出了一种基于信息密度谱聚类与启发式搜索的机器人操作任务规划方法。考虑到物体位置、尺寸和朝向等因素构建抓取场景图，以物体为节点，利用物体间空间约束得到连接边，并给出信息密度函数描述物体之间的相互影响进而计算边权，在此基础上，采用谱聚类缩减机器人搜索空间以提升搜索效率，并通过目标捆绑获得任务导向图。进而设计启发式搜索算法，结合机器人位置、操作约束、物体之间和物体与机器人之间的关系等要素，启发式选取待抓取物体，递归搜索生成抓取序列，最后择优确定最佳抓取链，其有效性在实际场景中进行了验证。第五，提出了一种基于分层式多尺度特征融合与逆交叉残差的抓取检测网络。针对现有基于编码器-解码器的抓取检测网络在解码过程中，主要使用编码器高层特征而导致细节信息丢失的问题，设计结合注意力机制的跳层连接模块，兼顾了编码器的低层和高层特征并进行多尺度特征融合以获取更丰富的特征。而且，设计了逆交叉残差模块挖掘编码器高层特征中丰富的通道信息，增强特征表达能力。进一步，跳层连接输出的多尺度特征和逆交叉残差模块输出的特征在解码器中进行逐层的深度融合，以获取高质量的抓取检测结果。数据集和实际场景中的实验表明了所提方法的有效性。第六，设计了服务机器人智能抓取软件架构，包括场景感知层、抓取任务规划层、椭圆锥势场搬移层、抓取检测层和抓取控制层，在ROS（Robot Operating System）框架下实现了所提实例分割方法、椭圆锥势场搬移方法、机器人操作任务规划方法、抓取检测方法的集成。实际场景的机器人抓取操作实验验证了其有效性。最后，对本文工作进行了总结，并指出了需要进一步开展的研究工作。
英文摘要	Good grasping ability of service robot is an important prerequisite for providing quality service. In realistic grasping environments, there often exist interferences from background and other objects. The realization of high-quality intelligent grasp of robots in complex situations is significant in both research and applications. This dissertation concerns the research on visual-based intelligent grasp for service robots. The main contents are as follows: Firstly, the research background and significance of intelligent grasp are given. The research development of object detection and segmentation, robot manipulation task planning and grasp detection is reviewed. The contents and structure of this dissertation are also introduced. Secondly, an instance segmentation method based on adaptive neck network and atrous-residual structure is proposed. Features are extracted from the input image through the backbone network. Aiming at the problems of low transmission efficiency and weak feature mining ability caused by layer-by-layer transmission of the existing feature pyramid neck network, an adaptive neck network composed of two bi-directional fusion units with top-down and bottom-up pathways is designed. The cross-layer feature transmission and multi-scale information fusion are executed, where each weight of feature maps is adaptively learned to fully explore the features. The output feature maps of the adaptive neck network are then processed by the detection and the mask prototype branches to obtain the results of instance segmentation. In the mask prototype branch, the atrous-residual structure is inserted to capture more information with different receptive fields, which thus overcomes the problem of the mask quality reduction due to the lack of better context information in existing mask prototype branches. The effectiveness of the proposed method is testified by experiments on datasets and actual scenarios. Thirdly, for the case where the target object is disturbed by other objects and cannot be directly grasped, a robot moving method based on the elliptic conical potential field is proposed. The proposed method endeavours to solve the problem that exiting methods donot make full use of the operating space, where rectangle or circular forbidden zone is used to construct the potential field function for the calculation of the placement points of the interference objects. Considering the location, size, and orientation of the object, a minimum elliptic envelope is adopted to describe forbidden zone. On this basis, elliptical cone function is designed, which reflects that the effect of the object on the environment follows the radiation in a form of the same eccentricity. Then, the elliptical cone attractive and repulsive potential fields are created to describe the influence of objects with different roles on the environment. The object to be grasped and other objects generate attractive and repulsive potential fields, respectively. These potential fields are superimposed, which is used to determine the placement point of the object to be grasped. The effectiveness of the proposed method is verified by experiments. Fourthly, aiming at the problem that the existing manipulation task planning methods generate operation sequences with low efficiency when there are many interference objects, a grasping task planning method based on information density spectral clustering and heuristic search is proposed. Taking the object position, size, and orientation into account, a grasp scene graph is constructed, where objects are considered as nodes, and the spatial constraints between objects are used to obtain the connection edges. Moreover, the information density function is given to describe the interaction between objects to calculate the edge weights. On this basis, spectral clustering is adopted to reduce the robot search space and improve the search efficiency. The task-oriented graph is then obtained by combining target bundling. Further, a heuristic search algorithm is designed. It considers the robot location, manipulation constraints, the relationships between objects as well as between objects and robot to heuristically select the object to be grasped. Afterwards, recursive search is executed to generate the grasp sequences and finally the best grasp chain is determined. The experiments on actual scene proves the effectiveness of the proposed method. Fifthly, a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual is proposed. To deal with the information loss problem caused by only feeding high-level features of encoders to decoder for the existing grasp detection networks based on encoder-decoder, the module of skip connections with attention mechanism is designed. It takes the low-level and high-level features of encoders into account. With multi-scale feature fusion, richer features are obtained. Moreover, an inverted shuffle residual module is designed to mine the rich channel information in the high-level feature of the encoder for the enhancement of the feature representation ability. The multi-scale features outputted by the module of skip connections with attention and the feature from the inverted shuffle residual module are in-depth fused layer by layer in the decoder to obtain high-quality grasp detection results. The proposed method is verified by the experiments. Sixthly, the intelligent grasping software architecture for service robots is designed, which includes scene perception layer, grasping task planning layer, elliptical cone potential field moving layer, grasp detection layer, and grasping control layer. Under the ROS framework, the above-mentioned proposed methods including instance segmentation, elliptical cone potential field moving, robot manipulation task planning, and grasp detection are integrated, which is testified by actual experiments. Finally, the conclusions are given and future work is addressed.
关键词	Service robot, Instance segmentation network, Elliptical cone potential field, Grasping task planning, Grasp detection network.
语种	中文
七大方向——子方向分类	智能机器人
国重实验室规划方向分类	智能进化环境
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52068
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	耿文杰. 基于视觉的服务机器人智能抓取研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
博士学位论文-耿文杰.pdf（40890KB）	学位论文		限制开放	CC BY-NC-SA