基于深度学习的三维点云识别算法研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 先进时空数据分析与学习

	基于深度学习的三维点云识别算法研究
	林华
	2020-05-29
页数	96
学位类型	硕士
中文摘要	随着三维传感器的普及和计算机算力的提升，三维计算机视觉的研究吸引了越来越多研究人员的关注，相对于二维图像，三维数据可以轻易地实现物体与背景的解耦，因此三维计算机视觉的研究在自动驾驶、虚拟现实、机器人和遥感测绘等实际应用中具有重要的应用潜力和市场价值。三维点云是三维数据最直观的表示方式，如何有效地从三维点云中提取特征并进行识别分类一直是三维计算机视觉领域的研究热点。为了使用卷积神经网络(convolution neural network, CNN)提取点云的内部特征，一些方法将点云转换为多视角图像或者三维体素，然而这些变换通常会导致三维点云中固有的几何信息大量丢失，并且具有很高的算法复杂度。近年来，越来越多的研究人员提出基于原始点云的三维点云识别算法，该方法直接作用于原始点云，而无需对其进行任何转换，因此没有显式的信息丢失。由于点云数据的复杂性，基于深度学习的原始点云识别算法还面临着诸多挑战： (1) 由于三维点云的无序性和非结构性，经典的 CNN 不能直接应用于原始三维点云的特征提取中，一些方法采用多层感知器提取点的特征，但其提取能力有限并且忽略了点与点之间的相关信息； (2) 在实际应用场景中，三维点云不一定是对齐的，所以需要网络具备对于点云旋转、位移和尺度变化的鲁棒性； (3) 传感器采集三维点云时难免会有噪声，需要算法具有一定的噪声鲁棒性。本文围绕如何解决这几个问题进行了深入的研究，创新点包括：（1）提出了一种新型三维点云上下文信息提取网络DSCNet。该方法采用形状上下文算法对点云的球形局部邻域进行划分，采用类似于几何模糊的方式模糊了球形邻域外围点的特征信息，形成一个包含粗糙和精细特征的上下文信息；最后基于前述的上下文信息，设计了一个多层次的网络从中学习形状特征。与目前其它优秀的方法相比，DSCNet不仅在上下文信息提取方面具有优势，而且对于点云噪声和稠密度具备更强的鲁棒性。此外，DSCNet还具备对点云的尺度、平移不变性以及一定条件下的旋转不变性。本文通过丰富的实验证明了其有效性和先进性。（2）提出了一种新的具有旋转不变性的点云学习网络DRINet。该方法采用距离信息表征点云中每个点的空间位置信息，核心思想是利用点与点之间的距离确定局部点云的内部刚性结构，由于这种表达方法只考虑了点与点之间的相对关系，因此对三维空间中的任意旋转都具有不变性；为了在该旋转不变的点云信息表征框架内有效提升网络的点云特征提取能力，本文提出在DRINet中使用一个基于注意力机制的自适应特征融合模型进行点云特征提取。本文在多个公开数据集上评估了该算法的表现，证明了其有效性。
英文摘要	With the prevelance of 3D sensors and the improvement of computer computing power, the research of 3D computer vision has attracted more and more researchers' attention. Compared with 2D images, 3D data can easily realize the decoupling between objects and background. Therefore, the research of 3D computer vision has important potential and market value in the practical application of automatic driving, virtual reality, robot and remote sensing mapping. Point cloud is the most intuitive representation of 3D data. How to effectively extract features from 3D point cloud and classify them has been a research hotspot in the field of 3D computer vision. In order to use convolution neural network(CNN) to extract the internal features of point cloud, some methods transform point cloud into multi-view image or 3D voxel. However, these transformations usually lead to the loss of the inherent geometric information in point cloud, and have high algorithm complexity. In recent years, more and more researchers have proposed point cloud recognition algorithms based on the original point cloud, which directly acts on the original point cloud without any conversion and in result there is no explicit information loss. Due to the complexity of point cloud data, the original point cloud recognition algorithm based on deep learning also faces many challenges: (1) Because of the disorder and unstructure of the point cloud, the general CNN can not be directly applied to the feature extraction of the original point cloud. Some methods use multi-layer perceptron to extract the features of points, but its extraction ability is limited and the relevant information between points is ignored; (2) In the practical application scenario, the point cloud is not necessarily aligned, so the network needs to be robust to the rotation, translation and scale change of the point cloud; (3) It is inevitable that there will be noise when the sensor collects 3D point cloud, so the algorithm needs to have certain noise robustness. This thesis makes an in-depth study on how to solve these problems: This thesis proposes a new network named dscnet for extracting 3D point cloud context information. In this method, the shape context algorithm is used to divide the spherical local neighborhood of the point cloud, and then the feature information of the peripheral points in the spherical neighborhood is blurred in a way similar to the geometric blur, forming a rough and fine feature. Finally, based on the above context information, a hierarchical network is designed to learn the shape features from it. Compared with other excellent methods, DSCNet not only has advantages in context information extraction, but also has stronger robustness for point cloud noise and density. In addition, DSCNet has scale and translation invariance to point cloud and rotation invariance under certain conditions. This thesis proves its effectiveness and advanced nature through abundant experiments. This thesis proposes a new point cloud learning network named DRINet, which has rotation invariance. This method uses the distance information to represent the spatial position information of each point in the point cloud. The core idea is to use the distance between points to determine the internal rigid structure of the local point cloud. Because this expression method only considers the relative relationship between points, it is invariant to any rotation in 3D space; In order to improve the feature extraction ability of point cloud effectively in this rotation invariant point cloud information representation framework, this thesis also proposes an adaptive feature fusion module (AFM) based on attention mechanism for point cloud feature extraction in DRINet. In this thesis, we evaluate the performance of the method on multiple open datasets and the results prove its effectiveness.
关键词	三维计算机视觉深度学习点云形状上下文几何模糊旋转不变性
学科领域	计算机感知 ; 计算机神经网络
学科门类	工学::计算机科学与技术（可授工学、理学学位）
语种	中文
七大方向——子方向分类	三维视觉
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39584
专题	多模态人工智能系统全国重点实验室_先进时空数据分析与学习
推荐引用方式 GB/T 7714	林华. 基于深度学习的三维点云识别算法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
林华的毕业论文.pdf（9641KB）	学位论文		开放获取	CC BY-NC-SA