Autonomous navigation ability is an important prerequisite for service robots to complete tasks with high quality, and grasp detection provides an important way to realize better human-environment interaction. It is significant in both research and applications. This thesis concerns the research on navigation and grasp detection for service robots. The main contents are as follows:
Firstly, the research background and its significance of this thesis is given. The research development of robot navigation, object detection and segmentation and grasp detection is reviewed. The contents and structure of this thesis are also introduced.
Secondly, based on the definitions of path and the amount of perception, a navigation method based on the situational awareness of path is proposed. The environment is described by paths and then the topological map is constructed by abstracting the path as topological edge. Also, the situational awareness value is used to describe the robot’s position in the path implicitly. Then the situational awareness fitting network is designed to realize the mapping of scene perception information to the situational awareness value, and the network structure is determined by the amount of perception. On this basis, with the combination of the result of global path planning based on topological map as well as the current motion situation of the robot, the motion decision is made by the laser-based local collision avoidance algorithm. Regardless of the global position represented by Cartesian coordinates, the robot makes decisions based on situational awareness value, and the effectiveness of the proposed navigation method is verified by experiments.
Thirdly, aiming at the problem that the existing simultaneous detection and segmentation methods are difficult to satisfy the real-time requirement with high accuracy, an improved simultaneous detection and segmentation network BlitzNet is proposed. The channel-based attention mechanism is added to the original BlitzNet network to adjust the channel weight of each output feature map in the trunk network, and the feature maps with multiple scales in the encoder are merged into the segmentation branch. Moreover, the weights of loss functions corresponding to boundary box classification, bounding box regression and image segmentation are optimized by self-learning of multi-task loss weightings in the training process. Also, the image segmentation loss function based on background suppression is designed to solve the problem of unbalanced ratio between background pixels and object pixels. Furthermore, inspired by the inpainting work of image in the field of computer vision, the image inpainting is introduced into the grasping task to solve the problem where the target object is severely occluded in the detection process. A recognition method of occluded target object based on image inpainting and recognition network IRNet is proposed. IRNet consists of a three-stage inpainting network with a coarse, an intermediate and a reﬁnement inpainting stages as well as an inpainting-based recognition network. It takes the output of the intermediate stage as the inpainting result, which provides more detailed texture with better anti-disturbance ability to the surrounding region, and the TOP3 recognition results are outputted. This method improves the accuracy of detection and segmentation with real-time performance, and provides a solution to the occlusion problem of the target object in the process of grasping, which is verified by the experiments on the dataset and actual scenes.
Fourthly, a pixel-level two-stream grasping convolutional neural network TsGNet is designed to determine the best grasp of the target object, where the depth map and grayscale map corresponding to the target object are regarded as the network input. The encoder of TsGNet adopts the depthwise separable convolution and its decoder uses the global deconvolution module GDN, which improves the accuracy of grasp detection with fewer network parameters and fast processing speed. On this basis, combined with the intrinsic and extrinsic matrixes of the camera, the best grasp outputted by TsGNet is converted to the desired pose of the manipulator, which is used to drive the manipulator to execute the grasping operation. The proposed method improves the accuracy of the grasp detection with a small amount of network parameters and faster speed, and the effectiveness is verified through the experiments on the Cornell grasping dataset and actual scenes.
Fifthly, the navigation and grasp software architecture of service robots is designed, which includes the situational awareness layer, navigation planning layer, target detection layer, grasp detection layer and grasp control layer. Under the ROS framework, the above-mentioned navigation, target detection and grasp detection methods are integrated, and the effectiveness is verified by the navigation grasping experiments in the official environment.
Finally, the conclusions are given and future work is addressed.