基于序列图像信息挖掘的物体检测与跟踪 | |
何嘉伟![]() | |
2024-05 | |
页数 | 124 |
学位类型 | 博士 |
中文摘要 | 物体感知是计算机视觉的一项关键技术,其中物体检测和跟踪是两个基础任务。基于图像的物体检测与跟踪被广泛应用于自动 总的来说,针对序列图像物体检测与跟踪问题,本文工作首先展开全监督方法的研究,包括利用图匹配的物体关联,应用于多目标跟踪任务,以及以物体为中心的时序三维物体检测。利用三维重建这一升维手段,可在无需深度模型的基础上获得物体三维表示方式。依此思想,本文探索了二维三维联合跟踪以及弱监督三维物体检测方法。本文所提出的方法对 |
英文摘要 | Object perception is a key technology in computer vision, with object detection and tracking being two fundamental tasks. Image-based object detection and tracking are widely used in practical scenarios such as autonomous driving vehicles, intelligent robots, drones, augmented reality, industrial quality inspection, video surveillance, and motion analysis. However, there are some difficult and challenging scenarios for object detection and tracking, such as 3D detection of small targets or objects at long distances, and tracking of occluded or overlapping objects with similar appearances or objects that have been invisible for a long time. These difficulties and challenges limit the practical application and further development potential of object detection and tracking technologies. The utilization of temporal information in sequential images is one effective method to address these difficulties and challenges. Therefore, this study focuses on the exploration of object detection and tracking tasks in computer vision perception research, which takes time sequence as the bridge. Particularly, this dissertation emphasizes the mining of temporal information in sequential images or videos. The main contributions presented in this dissertation include: 1) Proposing a multi-object tracking paradigm based on learnable graph matching that emphasizes the importance of intra-frame relationships. This method represents intra-frame relationships as undirected graphs and models data association problems as general graph matching problems. To solve the NP-hard quadratic assignment problem arising from original graph matching formulation efficiently, a continuous relaxation form based on quadratic programming is proposed. By combining it into deep neural networks using implicit function theorem and KKT conditions, a differentiable graph matching module can be obtained. To accelerate solving the graph matching problem, a gated search tree algorithm is designed which significantly speeds up solving the graph matching problem. 2) Proposing an online multi-object tracking method that only requires 2D labels for joint 2D-3D tracking. By reconstructing 3D scene point clouds from Structure-from-Motion methods, this work combines learnable graph matching paradigm to propose a new inter-image keypoint correspondence method that better reconstructs the entire scene. In reconstructed 3D point clouds, objects are clustered into point cloud clusters and the center positions of 3D objects can be obtained accordingly. This work designs a reconstruction-based pseudo-3D object label generation and 3D object representation learning module. By learning the 3D representation of objects solely from monocular videos and supervised by 2D tracking labels, there is no need for additional annotations from LiDAR or pre-trained depth estimators. 3) Proposing a temporal 3D object detection and tracking method based on global optimization of object temporal information. This work achieves object-centric temporal 3D reconstruction and designs a two-stage temporal 3D object detector accordingly. In particular, this work designs an object-centric temporal correspondence learning module jointly trained as the second stage with the object detection and proposes a featuremetric object-centric bundle adjustment loss function. The proposed temporal method can be used to detect 3D objects more accurately, especially improving the performance of long-distance 3D object detection. 4) Proposing a weakly supervised monocular 3D object detection method based on multi-stage generalization. Building upon the previous work, this study further investigates the learning methods for weakly supervised bounding box estimation in 3D using only 2D supervision. By leveraging the generalization ability of neural networks, a practical solution to this problem is proposed for the first time. Pseudo-labels of 3D bounding boxes obtained from 3D reconstruction are used in this work, which designs three stages of generalization: from complete objects to partially visible objects, from static objects to moving objects, and from close-range to long-range, making the weakly supervised 3D object detection method close to fully supervised performance. In summary, regarding sequential image-based object detection and tracking problems, this study initially conducts research on fully supervised methods including graph matching-based data association applied to multi-object tracking tasks as well as temporally connected centric-object-based temporal 3D object detection. By utilizing dimensionality elevation through 3D reconstruction, it is possible to obtain representations of objects in 3D without relying on depth estimation models. Based on this foundation, joint 2D and 3D tracking and weakly supervised 3D object detection methods are explored. The proposed methods in this study demonstrate significant performance improvements compared to concurrent works, achieving leading performance indicators on commonly used evaluation datasets in the field. They can effectively address object localization and temporal association problems in complex scenarios such as occlusion and long distances, demonstrating both academic innovation and practical application value. |
关键词 | 三维物体检测 多目标跟踪 时序信息挖掘 图匹配 三维重建 |
语种 | 中文 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/57422 |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 何嘉伟. 基于序列图像信息挖掘的物体检测与跟踪[D],2024. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
基于序列图像信息挖掘的物体检测与跟踪_何(22944KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[何嘉伟]的文章 |
百度学术 |
百度学术中相似的文章 |
[何嘉伟]的文章 |
必应学术 |
必应学术中相似的文章 |
[何嘉伟]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论