面向自动驾驶的平行视觉关键问题研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向自动驾驶的平行视觉关键问题研究
	王建功
	2023-06
页数	160
学位类型	博士
中文摘要	智能车对周边环境的精准感知理解是后续规划与控制的基础，也决定了自动驾驶系统的能力上限。而视觉感知作为人类以及众多智能系统获取环境信息的主要来源之一，在自动驾驶感知理解的过程中发挥着关键作用。目前，基于深度学习的计算机视觉技术在自动驾驶感知任务中应用广泛，总体取得了十分不错的成绩。然而，由于深度学习需要借助大规模的训练数据来提升模型感知精度，同时自动驾驶任务对系统的安全与可靠性也有着严格的要求，而已有的视觉感知方法往往面临着训练数据量不足、对长尾场景的识别能力不够等众多亟需解决的难题。基于平行系统提出的平行视觉是一个能够填补数据鸿沟、提升感知可靠性的新型视觉理论框架。本文以自动驾驶为应用场景，研究平行视觉中的多项关键问题。从数据生成、算法优化、数据与算法闭环、系统优化训练四个角度构建了一个有效且可靠的自动驾驶平行视觉系统。最终缓解了交通场景训练数据不足与长尾场景感知能力欠佳的问题、提高了视觉系统的准确度与稳定性。其中，可控的图像数据生成既有助于解决训练数据量不足的问题同时又是构建数据算法闭环的基础。算法的优化学习作为视觉感知系统的关键任务，具有提高系统感知能力的作用。数据与算法的闭环则是构建完整平行视觉系统的核心，在实现数据生成与算法优化的协同趋优的同时能够提高系统鲁棒性与可靠性。最后，针对完整的平行视觉系统，科学高效的平行训练方法可以进一步提高系统感知精度与稳定性。本文的主要工作如下：针对自动驾驶图像数据不完备、特别是稀缺场景数据不足的问题，提出了两种面向交通场景的图像生成关键方法。本文首先提出了一种基于Transformer架构的多尺度、多判别器风格迁移技术，实现了大规模的虚拟图像的逼真化迁移与生成，通过提高虚拟图像的可用性间接降低了数据获取的成本；其次面向稀缺驾驶场景，本文提出了一种基于金字塔结构的单样本逼真图像生成方法。作为第一种方法的补充，该方法通过模型再训练的图像采样方式缓解了交通图像数据集在稀缺驾驶场景中不完备的问题。针对人工生成的虚拟图像与真实图像分布不一致问题，提出了一种虚实互动的跨域特征提取方法。该技术通过数据层面上风格与内容分离的数据重组模块以及知识层面上迭代式跨域的知识迁移模块共同组成ParaTeacher模型，并将其作为构建平行视觉计算实验中虚拟数据到感知算法闭环的关键方法，实现对虚拟交通图像的充分利用和在真实交通图像上的目标检测精度20%以上的提升。针对交通视觉场景中普遍存在的长尾问题，提出了常态化长尾理论以及数据与算法协同优化的系统闭环关键方法。分别从理论与方法层面给出了交通视觉长尾问题的解决思路。通过感知算法反馈的人工场景优化与图像数据定向生成，来缓解交通视觉系统面临的长尾问题。最后，提出的方法被应用于智能车未来挑战赛长尾场景设置和长尾交通标识识别中，并在长尾交通标识数据库——TT100K-TSRD中取得了80%以上的长尾类别均有10%以上识别精度提升的实验效果。针对复杂平行视觉系统的优化训练问题，提出了一种面向复杂智能驾驶系统的优化方法——平行训练。该方法通过虚拟驾驶空间的预测推演来减少由真实驾驶空间诸多不确定因素对智能驾驶系统产生的影响，最终构建安全高效的平行驾驶系统。具体而言，通过由数据域的图像生成控制器与算法域的模型学习控制器组成的平行教练实现对平行视觉系统数据与算法闭环的优化控制。本文针对自动驾驶中平行视觉系统的多个关键问题，提出了多项具体的关键方法，对于平行视觉在自动驾驶中的实际应用具有重要意义。
英文摘要	The intelligent vehicle's accurate perception and understanding of the surrounding environment is the basis for subsequent planning and control. As a result, it is crucial in determining the maximum capabilities of an autonomous driving system. As one of the main sources of environmental information for humans and many intelligent systems, visual perception plays a critical role in the process of autonomous driving perception and understanding. Currently, computer vision technology based on deep learning is widely used in autonomous driving perception tasks and has generally achieved good results. However, due to the need for large-scale training data to improve the accuracy and the strict requirements of system safety and reliability, the existing visual perception methods often face many challenges that urgently need to be addressed, such as insufficient training data and inadequate recognition capabilities for long-tail scenarios. Parallel vision based on parallel systems is a new visual theoretical framework that can fill the data gap and improve the reliability of perception. This thesis focuses on multiple key issues related to parallel vision in the application scenario of autonomous driving. An effective and reliable parallel vision system for autonomous driving is constructed from four perspectives: data generation, algorithm optimization, data-algorithm closed loop, and system training. This eventually alleviates the problems of insufficient training data in traffic scenarios and inadequate perception capabilities in long-tail scenarios, improving the accuracy and stability of the visual system. Among these, controllable image data generation helps solve the problem of insufficient training data while also serving as the basis for constructing a closed-loop between data and algorithm. The optimization learning of models is the key task of the visual perception system and contributes to improving the system's perception capabilities. The closed-loop between data and algorithms is the core of constructing a complete parallel vision system, which can enhance the system's robustness and reliability while achieving the synergistic optimization of data generation and algorithm optimization. Finally, for a complete parallel vision system, scientific and efficient parallel training methods can further improve the system's perception accuracy and stability. The main work and contributions of this thesis include the following aspects: Two key image generation technologies for traffic scenarios are proposed to address the problem of incomplete autonomous driving image data, especially the lack of data for scarce scenes. Firstly, a multi-scale and multi-discriminator style transfer technology based on Transformer is proposed to realize realistic transfer and generation of large-scale virtual images, which finally reduces the cost of data acquisition indirectly by improving the availability of virtual images. Secondly, a single-sample realistic image generation method based on a pyramid structure is proposed for scarce driving scenarios. As a complement to the first approach, the image sampling method of model retraining is used to compensate for the problem of incomplete traffic image datasets in scarce driving scenarios. A cross-domain feature extraction technology with virtual-real interaction is proposed for the problem of inconsistent distribution of artificially generated virtual images and real images. ParaTeacher model is composed of a style-content discriminated and data recombination module at the data level and an iterative cross-domain knowledge transfer module at the knowledge level. It is applied as the key technology to construct a closed loop from virtual traffic images to perception algorithms in parallel vision computational experiments to fully utilize virtual traffic images and the improvement of object detection accuracy by over 20% on real traffic images. For the common long-tail problem in traffic vision scenarios, a long-tail regularization theory and a data-algorithm collaborative optimization method are proposed respectively from theoretical and methodological perspectives to solve long-tail vision problems in traffic scenarios. Through the artificial scenarios optimization and image data directional generation fed back by the perception algorithm, the long-tail problem faced by the traffic vision system is alleviated. Finally, the proposed method was applied to the long-tail scenario setting of the Intelligent Vehicle Future Challenge~(IVFC) and the task of long-tail traffic sign recognition. It achieves the experimental result that more than 80% of the long-tail categories in the long-tail traffic sign database, TT100K-TSRD have over 10% improvement in recognition accuracy. An optimization method for complex intelligent driving systems, parallel training, is proposed for the training of complex parallel vision systems. Through the predictive projection of the virtual driving space, the impact of many uncertainties caused by the real driving space on the intelligent driving system is reduced, and a safe and efficient parallel driving system is built. Specifically, parallel coaches consisting of image generation controllers in the data domain and model learning controllers in the algorithm domain are used to realize the optimal control of the closed-loop data and algorithm of the parallel vision system. This thesis addresses several key issues of parallel vision systems in autonomous driving and proposes several specific key approaches that are important for the practical application of parallel vision in autonomous driving.
关键词	平行视觉自动驾驶人工系统计算实验平行执行平行训练
语种	中文
七大方向——子方向分类	图像视频处理与分析
国重实验室规划方向分类	视觉信息处理
是否有论文关联数据集需要存交	是
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/52102
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王建功. 面向自动驾驶的平行视觉关键问题研究[D],2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
面向自动驾驶的平行视觉关键问题研究.pd（19546KB）	学位论文		限制开放	CC BY-NC-SA