面向复杂场景的跨维度视觉感知方法研究 | |
潘聪 | |
2024-05 | |
页数 | 113 |
学位类型 | 博士 |
中文摘要 | 在人工智能技术的迅猛发展推动下,视觉场景感知在自动驾驶、智能监控和机器人导航等领域展现出了巨大的应用潜力。面向复杂场景的视觉感知方法旨在精准捕捉和处理视觉信息,从而实现对物体的识别、场景的理解和行为的指导。随着深度学习技术的快速进步,视觉场景感知领域取得了显著发展,感知性能不断提升。然而,在实际应用场景中,物体尺度的多样性和场景的复杂性为传统的二维视觉感知方法带来了新的挑战。特别是在复杂的自动驾驶场景中,高效的环境感知和目标精确识别对于确保车辆安全行驶至关重要。因此,结合二维图像和相机标定参数进行跨维度的视觉感知成为了视觉感知领域的一个重要研究方向。本文采用从二维到三维、由单目到多目的逐步深入的策略,对复杂场景下的跨维度视觉感知方法进行了研究。本文的主要贡献包括: 1. 提出了一种基于尺度学习的可部署二维目标检测方法。通用二维目标检测面临的最大挑战之一是尺度变化,在实际应用中物体种类繁多且尺度不一,同一类物体可能以不同尺度出现。现有方法在学习目标尺度、训练效率和推理速度方面仍存在局限性,且难以满足硬件部署的需求。针对该挑战,本方法旨在保证二维目标检测网络对不同尺度物体的感知能力的同时,实现硬件的可部署性。通过分析目前通用视觉场景下二维目标检测网络框架中的感受野分布,本方法设计了一种自动搜索的全局多尺度感知网络,并提出了一种尺度分解方法,将学习到的分数尺度转换为整数且固定的尺度组合。同时,设计了一个快速部署网络,该网络能够在推理过程中加速并支持硬件优化。此外,本研究还使用推理引擎对提出的模型进行优化,实现更快的推理速度。实验结果表明,本方法在目标检测任务上相较于现有方法能够取得一定的性能提升,且更适合硬件部署。 |
英文摘要 | Propelled by the rapid development of artificial intelligence, visual scene perception has demonstrated immense potential for application in areas such as autonomous driving, intelligent surveillance, and robot navigation. Methods of visual perception tailored for complex scenes aim to precisely capture and process visual information, thereby facilitating object recognition, scene comprehension, and behavior guidance. With the swift progress of deep learning technology, the field of visual scene perception has seen significant advancements, with continual enhancements in perception performance. However, in practical application scenarios, the diversity of object scales and the complexity of scenes present new challenges to traditional two-dimensional visual perception methods. Particularly in intricate autonomous driving scenarios, efficient environmental perception and accurate object recognition are crucial for ensuring the safe navigation of vehicles. Consequently, cross-dimensional visual perception, which integrates two-dimensional images and camera calibration parameters, has emerged as an important research direction in the field of visual perception. This dissertation adopts a progressively deepening strategy from two-dimensional to three-dimensional and from monocular to multi-view, to explore cross-dimensional visual perception methods for complex scenes. The main contributions of this dissertation include: 3. A method for multi-view three-dimensional semantic segmentation based on the interaction between bird's-eye view and image features is proposed. In autonomous driving, a single viewpoint often fails to provide sufficient information for driving decisions and path planning, whereas a multi-view vision system can offer a more comprehensive view of the environment, which is crucial for safe vehicle operation. Research on surround-view multi-view vision perception has gained attention with the release of large-scale multi-view autonomous driving datasets. However, the inherently two-dimensional nature of multiple monocular images, lack of depth information, and increased computational load pose new challenges. To address these challenges, this dissertation designs a bidirectional early-interaction Transformer framework with a bidirectional cross-attention mechanism to constrain image feature extraction and the alignment between the image and bird's-eye view feature spaces. Then the multi-view image features are integrated into a unified bird's-eye view representation. The input image resolution is expanded, and multi-scale image features are downsampled before feature interaction to control the model's parameters and computational load while improving semantic segmentation performance. Experimental results demonstrate the method's effectiveness in enhancing semantic segmentation performance and enabling cross-view and cross-dimensional scene perception while ensuring real-time inference. |
关键词 | 视觉场景感知 二维目标检测 单目三维目标检测 鸟瞰图语义分割 视觉Transformer |
语种 | 中文 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/57595 |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 潘聪. 面向复杂场景的跨维度视觉感知方法研究[D],2024. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
潘聪博士毕业论文_2024_最终打印提交(28980KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[潘聪]的文章 |
百度学术 |
百度学术中相似的文章 |
[潘聪]的文章 |
必应学术 |
必应学术中相似的文章 |
[潘聪]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论