CASIA OpenIR  > 毕业生  > 博士学位论文
面向大范围场景的高效点云目标检测
范略
2024-05-16
Pages110
Subtype博士
Abstract

三维点云数据提供了对物理世界简洁而精准的刻画,是最基础的三维数据形式之一。
因其所蕴含的精准的几何与深度信息,点云数据在以高级别自动驾驶感知系统为代表的具身智能感知系统中扮演着不可或缺的角色。而在此类感知系统中,基于点云数据的目标检测无疑是最核心的功能之一,它为下游的目标跟踪、轨迹预测、以及路径规划提供了重要基础。

近年来,随着自动驾驶行业的迅猛发展,点云目标检测方法百花齐放,在检测精度上取得了显著提升。
然而,目前已有方法仍然存在一些亟待解决的研究问题,其中最紧迫的问题之一便是大范围场景下的的高效检测。一般的低级别辅助驾驶系统或者机器人系统只需关注较小范围内的环境信息,比如前者一般只需关注100米以内的情况,后者可能只需关注所处房间的信息。而对于高级别的自动驾驶系统,特别是需要运行在高速环境下的系统,大范围的实时感知是必不可少的。例如,在高速公路上行驶的重型无人驾驶卡车需要至少200米的安全距离来完成一次紧急刹车或者舒适变道。在更为特殊的情况下,对安全距离的要求更加严格。因此,高效的大范围感知是一个极具实际意义的研究问题。然而,当前主流的公开数据集只有最大80米的感知半径。因此,当前基于此类数据设计的目标检测器忽视了大范围的感知需求,它们的算法复杂度导致其无法直接拓展到200米以上的感知半径并仍然保持高效。鉴于以上的研究现状和技术需求,本文的研究聚焦于大范围场景中的高效点云目标检测。具体而言,本文的内容从以下四个方面逐层递进:

1. 探索适用于大范围场景的输入数据结构。本文提出采用点云的距离视图来作为检测器的输入。距离视图是类似于深度图像的结构化数据形式,感知半径的大小体现在其像素值而非分辨率上,是实现大范围场景高效检测的理想数据形式。但其同时也存在着诸多缺点,比如其二维的形式使得三维结构信息在一定程度上受损,其近似于透视投影的成像方式也使得其上的物体有着较大的尺度变化。本文提出了一系列新颖的技术攻克上述技术难点,使得基于距离视图的检测性能首次达到了与主流基于鸟瞰视图或三维体素的方法相当的水准,并同时为大范围检测奠定了输入数据结构方面的基础。

2. 探索适用于大范围场景的高效基础算子。基础算子是算法的基石,以往的方法主要利用普通卷积、稀疏卷积、以及点操作符等基础算子,然而它们均不适用于大范围场景中的高效检测。在大范围场景和更精细的三维体素表示中,普通卷积由于其需要在密集的特征图上计算而导致了极大的计算开销。稀疏卷积极大弱化了网络有效感受野范围,并且依赖于专业底层开发人员的手工部署。而更为原始的点操作符包含邻域查询、最远点采样等具有极高复杂度的操作,导致其在大范围场景中较为低效。鉴于上述缺陷,本文首次将原生注意力机制引入点云检测领域,并对其进行空间稀疏化,提出了一种基于注意力机制的稀疏算子,它天然地适应了空间分布不均匀、不定长的点云数据特性,并且具有更大的感受野,为大范围的高效点云目标检测奠定了算子方面的基础。

3. 探索大范围场景下的整体检测框架构建。目前的主流检测方法整体地、或者部分地依赖于鸟瞰视图下的密集特征图,这是其无法有效拓展到大范围场景的核心原因之一。本文基于上文所述的数据基础和算子基础,进一步通过稀疏实例识别技术克服了稀疏结构中物体“中心特征缺失”问题,从而构建了一种完全稀疏的整体检测框架。该框架中不再存在任何空间密集的结构,理论上可以拓展到任意大的感知半径。至此,本文实现了大范围场景中的高效点云目标检测这一目标,并为后续的综合应用提供了条件。

4. 探索基于大范围高效检测框架的综合应用。大范围高效检测算法除了在车辆上实际部署外,还有一个重要的应用是三维物体自动标注。算法的大范围的特点使其可以覆盖极长的物体运动轨迹,从而获取长时序的信息来提升物体的自动标注精度。算法的高效性也可以进一步缩短标注周期。因此,本研究最后一部分的内容致力于整合前期在大范围高效检测上的成果,并引入超长轨迹信息提取技术,构建一个可以利用超长时序信息的高精度离线自动标注系统。该系统的标注精度首次超越了人类标注员的平均精度,极大降低了数据积累和算法开发周期。

总而言之,本文从数据、算子、整体架构、综合应用四个方面对面向大范围场景的高效点云目标检测展开了递进式的全面研究。提出了一系列新颖的技术解决了大范围场景中点云目标检测的效率瓶颈,构建出了一个完整的大范围高效检测框架,首次实现了200米以上感知半径的实时检测以及超越人类精度的三维物体自动标注,具有相当的实际应用价值与学术研究意义。

Other Abstract

Point cloud provides a concise and accurate representation of the physical world and is one of the most fundamental representations of three-dimensional data.
Due to its precise geometric and depth information, point cloud plays an indispensable role in embodied intelligent perception systems, represented by high-level autonomous driving perception systems. In such perception systems, point cloud object detection is undoubtedly one of the core functionalities, providing the foundation for downstream tasks such as tracking, prediction, and planning.

In recent years, with the rapid development of the autonomous driving industry, various methods for point cloud object detection have emerged, achieving significant improvements in detection accuracy. However, existing methods still face some key issues that urgently need to be addressed, with one of the most pressing being inefficiency in large-scale scenes.
Low-level autonomous driving systems or robot systems only need to focus on situations on a relatively small scale. For example, the former generally usually needs to consider environmental information within only 100 meters, while the latter may only need information within the room it operates in. However, for high-level autonomous driving systems, especially those operating in high-speed environments, large-scale real-time perception is indispensable.
An example is heavy rucks traveling at high speeds on highways, which require a minimum safe distance of at least 200 meters for emergency braking or comfortable lane changes.
In more special cases, the requirements for the safe distance are even stricter.
Therefore, in such scenarios, efficient large-scale perception is a research topic of practical significance. However, current mainstream datasets only focus on a perception range of up to 80 meters. The time and space complexity of them are quadratic to the perception range. Thus, they cannot be directly extended to a perception range over 200 meters while maintaining real-time performance. Given the current research status and technological requirements mentioned above, this dissertation focuses on efficient point cloud object detection in large-scale scenes. Specifically, this dissertation progressively explores the following four aspects:

1. Exploration of suitable input data structures for large-scale scenes. This dissertation proposes the range view of the point cloud as input for detectors. Range views are structured data similar to depth images, where the size of the perception range is encoded in pixel values rather than resolution, making them an ideal representation for efficient perception in large-scale scenes.
However, the range view also has several drawbacks, such as the 2D form leading to a partial loss of 3D structural information, and their approximating perspective projection resulting in significant scale variations of objects. This paper proposes a series of novel techniques to overcome these technical challenges, enabling the performance based on range views comparable to mainstream methods based on bird's eye views or 3D voxels for the first time. It further lays the foundation for large-scale perception in the aspect of input data.

2. Exploration of efficient basic operators for large-scale scenes. Basic operators are the cornerstone of network structures. Previous methods mainly adopt convolutions, sparse convolutions, and point-based operators, but they are not suitable for efficient detection in large-scale scenes.
In large-scale scenes and 3D voxel representations, convolutions incur significant computational costs due to their need to compute dense feature maps. Sparse convolutions greatly weaken the effective receptive field in the network and rely on manual deployment by professional developers.
Point-based operators, containing neighborhood queries and farthest point sampling, have high complexity and low efficiency in large-scale scenes. Given these shortcomings, this dissertation introduces native attention mechanisms into point cloud object detection for the first time and makes them spatially sparse, proposing an attention-based sparse operator.
It naturally adapts to the characteristics of point cloud data with spatially non-uniform and variable lengths and has a larger receptive field. The efforts lay the foundation for efficient point cloud object detection in the aspect of basic operators.

3. Exploration of framework construction in large-scale scenes. Current mainstream detection methods either completely or partially rely on dense feature maps from bird's eye views, which is one of the core reasons why they cannot be effectively extended to large-scale scenes. Based on the aforementioned data and operator foundations, this paper overcomes the problem of "center feature missing" in sparse structures by developing sparse instance recognition and further constructs a fully sparse detection framework. There are no dense structures in this framework, theoretically extendable to any perception range. Thus, this dissertation fully realizes the technical goal of efficient point cloud object detection in large-scale scenes and lays the foundation for subsequent comprehensive applications.

4. Exploration of comprehensive applications based on efficient detection frameworks. In addition to being deployed on autonomous driving vehicles, another important application of the proposed framework is offline auto-labeling. The characteristic of large-scale perception allows for covering extremely long object trajectories, thereby obtaining long-term information to improve the accuracy of object labeling. Moreover, the efficiency of the proposed algorithm can greatly reduce the resource costs required for labeling. Therefore, the final part of this dissertation is devoted to integrating the achievements in large-scale efficient detection and introducing techniques for extracting super-long trajectory information to build a high-precision offline auto-labeling system. The precision of this system surpasses human annotators for the first time, greatly reducing the cost of data accumulation and related algorithm development.

In summary, this dissertation conducts a progressive and comprehensive study on efficient point cloud object detection in large-scale scenes from four aspects: data, operators, overall architecture, and comprehensive applications. It proposes a series of novel techniques to solve the efficiency and performance bottlenecks of point cloud object detection in large-scale scenes, constructs a complete and efficient detection framework for large-scale scenes, achieves real-time detection with a perception range of over 200 meters for the first time, and surpasses human accuracy in 3D object auto-labeling. This dissertation demonstrates considerable value for application and research.

Keyword目标检测 三维点云数据 大范围场景 实时效率
Language中文
IS Representative Paper
Sub direction classification目标检测、跟踪与识别
planning direction of the national heavy laboratory环境多维感知
Paper associated data
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57416
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
范略. 面向大范围场景的高效点云目标检测[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
面向大范围场景的高效点云目标检测-65上(10874KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[范略]'s Articles
Baidu academic
Similar articles in Baidu academic
[范略]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[范略]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.