交通场景图像多元素内容解析

CASIA OpenIR > 毕业生 > 博士学位论文

	交通场景图像多元素内容解析
	郭云飞
	2024-05-17
页数	106
学位类型	博士
中文摘要	交通场景图像的感知与理解对智能交通系统的综合性能和应用效果具有重要影响。通过解析图像中的关键交通元素，如道路、车道、交通标志等，系统能够实时监测交通状态，准确获取环境信息，为自动驾驶系统提供决策支持，提高交通顺畅性和行车安全性。交通场景感知和理解也是模式识别和计算机视觉领域的重要研究内容。本文研究交通场景图像中道路、车道以及交通标志等多种元素的解析问题，提出了几种方法逐步实现对交通场景图像中多元素的内容解析：首先研究交通标志的内容解析，然后推广到整个交通场景图像中多种元素的整体解析。论文的主要创新工作如下：（1）提出了一种基于检测和关系推理的交通标志解析框架。该框架首先通过组件检测模块确定组件位置和类别，然后通过关系推理模块分析组件之间的关系，并通过标志分类模块获取交通标志的类别，最后通过启发式的语义描述模块生成交通标志的语义描述。结合交通标志的特点，该框架的每个模块都进行了针对性的改进，显著提升了不同子任务的性能，最终在多任务协同下实现准确解析。实验结果表明，该框架在多种交通标志上均取得了良好的效果。（2）提出了一种基于版面感知的交通标志语义描述方法。该方法包含一个动态预测的Transformer模型，该模型能融合视觉、空间、语义和关系等多种特征，自动生成语义描述。与启发式方法不同，该方法摆脱了对规则和模板的依赖，具有更强的鲁棒性和普适性。在相关数据集上的实验结果表明，该方法能有效改善模型的交通标志语义描述能力，显著提升最终性能指标。（3）提出了一种基于层次化推理的交通场景关系解析方法。该方法引入层次化的图注意力网络，通过构建层次图以不同的方式处理交通场景图像中不同类型的元素，并通过添加跨级链接实现了不同层级间信息的传递，最终以高效的方式推理出不同元素之间的复杂关系。实验证明，该方法在效果上显著优于经典的图神经网络，具备良好的关系推理性能，并且能够有效地获取元素关系以辅助视觉交通知识图谱生成。（4）提出了一种基于查询去噪的端到端交通场景解析框架。该框架通过宏观感知模块实现粗粒度的道路和车道分割以及交通标志检测，通过微观感知模块实现细粒度的交通标志组件检测，通过关系推理模块以层次化方式实现多元关系推理和文本属性识别。该框架完全基于查询预测方法，通过创建多种查询实现不同任务间的信息传递，并且引入了查询去噪训练以提高查询的表达和预测能力。实验结果表明，该框架有效地实现了交通场景图像多元素内容的整体解析，生成了准确的视觉交通知识图谱，并在有关数据集上取得了最佳性能。
英文摘要	The perception and understanding of traffic scene images exert a significant impact on the comprehensive performance and effectiveness of intelligent transportation systems. By parsing key traffic elements in images, such as roads, lanes, and traffic signs, the system can monitor traffic conditions in real time, accurately obtain environmental information, and provide decision support for autonomous driving systems, so as to enhance traffic flow and vehicular safety. The perception and understanding of traffic scenes are also important research problems in pattern recognition and computer vision fields. This thesis studies the parsing of various elements in traffic scene images, such as roads, lanes, and traffic signs, proposing several methods to achieve the content parsing of multiple elements for traffic scenes: starting from the content parsing of the traffic sign and then generalizing to the holistic parsing of multiple elements within the entire traffic scene image. The main innovative contributions of this thesis are as follows: (1) A detection and relation reasoning-based framework is proposed for traffic sign parsing. This framework first determines component positions and categories through a component detection module, analyzes the relations between components through a relation reasoning module, obtains the category of traffic signs through a sign classification module, and finally generates semantic descriptions of traffic signs through a heuristic semantic description module. Considering the characteristics of traffic signs, each module of this framework is tailored to improve the parsing effectiveness of different subtasks. Under the cooperation of multiple subtasks, the framework parses signs accurately. Experimental results demonstrate that the proposed traffic sign parsing framework achieves promising performance across a variety of traffic signs. (2) A layout-aware semantic description method is proposed for traffic sign parsing. The method includes a dynamic prediction Transformer capable of integrating various features such as visual, spatial, semantic, and relational features to automatically generate semantic descriptions. Unlike heuristic methods, this method eliminates the dependence on rules and templates, thus showing better robustness and universality. Experimental results demonstrate that this method can improve the description generation ability of the model and its performance on the final metrics effectively. (3) A hierarchical reasoning-based method is proposed for traffic scene relation parsing. This method includes a hierarchical graph attention network to process different types of elements in different ways by constructing hierarchical graphs and also facilitate message propagation between different levels through the incorporation of cross-level links, ultimately reasoning the complicated relations between elements in an efficient manner. Experimental results demonstrate that this approach significantly outperforms classical graph neural networks in terms of effectiveness, achieves notable relation reasoning performance, and can effectively acquire element relations to aid the generation of visual traffic knowledge graphs. (4) A query denoising-based end-to-end traffic scene parsing framework is proposed. This framework achieves coarse-grained road and lane segmentation and traffic sign detection through a macro perception module, fine-grained traffic sign component detection through a micro perception module, and multi-element relation reasoning and text attribute recognition in a hierarchical manner through a relation reasoning module. The framework is entirely based on query prediction methods, enabling message propagation between different tasks through the creation of multiple kinds of queries. Additionally, query-denoising training is introduced to enhance the expression and prediction capabilities of queries. Experimental results demonstrate that the framework effectively achieves holistic parsing of multi-element content in traffic scene images, generates accurate visual traffic knowledge graphs, and achieves the best performance on related datasets.
关键词	端到端交通场景解析交通标志解析关系推理视觉交通知识图谱
学科领域	计算机感知
学科门类	工学::计算机科学与技术（可授工学、理学学位）
语种	中文
是否为代表性论文	是
七大方向——子方向分类	人工智能+交通
国重实验室规划方向分类	视觉信息处理
是否有论文关联数据集需要存交	否
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/57395
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	郭云飞. 交通场景图像多元素内容解析[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
学位论文_郭云飞_交通场景图像多元素内容（13813KB）	学位论文		限制开放	CC BY-NC-SA