CASIA OpenIR  > 毕业生  > 硕士学位论文
基于多模态协同的驾驶行为预测
董清辉
2024-05
页数82
学位类型硕士
中文摘要

在汽车和人们生活密不可分的今天,提高驾驶安全成为人们时刻关注的问题。相关数据显示,驾驶行为不当是交通事故的主要原因,如果能提前1秒预测驾驶行为就可以识别潜在风险,从而做出改变降低事故概率。另一方面,尽管智能驾驶技术在学术研究和产业技术上都取得了很大的进步,但完全的自动驾驶尚不现实,人车共驾的模式仍是主流。驾驶行为预测(Driving Behavior Prediction, DBP)是智能驾驶系统理解人-车-环境之间复杂交互关系的重要途径,其是驾驶安全和智能驾驶系统决策规划的基础。同时,其在交通效率优化方面也有巨大应用潜力。尽管驾驶行为预测具有很高的现实应用价值,且有着很长的研究历史,但相关工作仍有很多不足,例如缺乏大型公开基准、行为类别单一和模型泛化性和使用效率较差等问题。着眼于这些不足,本文基于HDD(Honda Research Institute Driving Dataset)中的注释,从原始的多模态数据序列中剪辑5000多个片段,形成一个新的数据集,称为HDD4DBP(HDD for DBP),作为评估各种驾驶行为预测方法的大规模测试平台。此外,进一步基于多模态数据提出驾驶行为和轨迹联合预测的方法,在提高驾驶行为预测性能的同时提高模型使用效率。本文主要研究内容概括如下:

(1)创建了一个驾驶行为预测任务的新的大规模多模态数据基准HDD4DBP。具体地,本文基于HDD中的注释,从未修剪的多模态数据序列中剪辑标注行为起始时间前5s的数据,包括驾驶员视角的道路交通场景、车辆控制参数和车辆状态参数等多模态信息。进一步地,在道路场景视频的基础上自动标注交通参与者、交通标识和车道线等信息。最后,总共获得5000多个多模态序列数据片段,并且覆盖十种驾驶行为。同时,受到到Transformer在序列中长距离依赖性建模方面取得的成功的启发,本文基于经典视觉Transformer主干网络MViTv2(Multiscale Vision Transformer)构建了驾驶行为预测的新基线。根据行为标签的层级结构特性,进一步地提出了层级多任务学习的策略,通过增加辅助任务如方向感知和粗粒度行为预测,提升模型细粒度行为预测能力。实验结果表明,仅使用外部车辆数据即可实现超过80$\%$的预测精度。此外,本文在Brain4Cars上对HDD4DBP预训练模型进行微调,证明了该方法在各种交通场景中良好的泛化能力。

(2)在HDD4DBP基础上,提出基于多模态融合的驾驶行为与轨迹联合预测的算法框架。具体地,首先本文针对性解决了HDD4DBP的多模态数据对齐问题,并基于HDD4DBP中的车辆速度和偏航数据计算获取行驶轨迹。其次,在时序预测模型PatchTST(Patch Time Series Transformer)的基础上,本文提出了一种轨迹预测方法。通过改进片段嵌入(Patch Embedding)操作,实现了轨迹和车辆控制数据之间的关系建模和局部轨迹特征提取。进一步地,本文简化视频特征提取模块,通过从历史轨迹和预测轨迹中学习方向线索,并将其与视频特征进行融合,以实现驾驶行为预测。最后,通过大量实验证明轨迹预测对驾驶行为预测性能的积极影响。同时,本文还探究了不同联合学习策略的优缺点以及不同融合方式对驾驶行为预测性能的影响。实验结果表明,采用联合学习策略在HDD4DBP上实现了84.72$\%$的预测精度,$F_1$达到66.75$\%$。此外,该方法不仅在驾驶行为预测结果上表现优异,而且其参数量更少、学习方式更高效。

本文的工作既为驾驶行为预测的研究创建了一个大规模多模态基准数据集HDD4DBP,构建基于Transformer驾驶行为预测新基线,并在此基础上提出基于多模态融合的驾驶行为与轨迹联合预测框架,显著提升了驾驶行为预测方法的性能和计算效率,为未来驾驶行为预测方法研究提供更大规模的数据支持和公平比较的平台基础。

英文摘要

In today's world where cars and people's lives are inseparable, improving driving safety has become a constant concern. Relevant data show that improper driving behavior is the main cause of traffic accidents, if driving behavior can be predicted 1 second in advance driving can identify potential risks, so as to make changes to reduce the probability of accidents. Meanwhile, although intelligent driving technology has made great progress in academic research and industrial technology, but the complete automatic driving is not yet realistic, and the mode of human-vehicle co-driving is still the mainstream. Driving Behavior Prediction (DBP) plays an important role in intelligent driving systems to understand the complex interactions between human-vehicle-environment, which is the basis for driving safety and decision planning of intelligent driving systems. Meanwhile, it also has great potential for application in traffic efficiency optimization. Despite the high value of real-world applications and a long research history of driving behavior prediction, there are still many shortcomings in related work, such as the lack of large-scale public benchmarks, a single behavioral category, and poor generalization and usage efficiency. Focusing on these shortcomings, this paper clips more than 5000 segments from the original multi-modal data sequences based on the annotations in the HRI Driving Dataset (HDD) to form a new dataset called HDD4DBP (HDD for DBP), which serves as a large-scale testbed for evaluating various methods of driving behaviour prediction. In addition, a joint driving behaviour and trajectory prediction method is further proposed based on multi-modal data to improve the driving behaviour prediction performance while increasing the efficiency of model usage. The main research of this paper is summarised as follows:


(1) A new large-scale multi-modal data benchmark HDD4DBP for the driving behaviour prediction task is created. Specifically, in this paper, based on the annotations in the HDD, the data 5s prior to the onset time of the annotated behaviour is clipped from the unpruned multi-modal data sequences, which includes the multi-modal information such as the road traffic scene from the driver's point of view, the vehicle control parameters, and the vehicle state parameters. Further, information such as traffic participants, traffic signs and lane lines are automatically labelled on the basis of the road scene video. Finally, a total of more than 5000 multi-modal sequence data segments are obtained and cover ten driving behaviours. Meanwhile, inspired by the success of Transformer in modelling long distance dependencies in sequences, this paper constructs a new baseline for driving behaviour prediction based on the classical visual Transformer backbone network MViTv2. Based on the hierarchical structure of behavioural labels, a hierarchical multi-task learning strategy is further proposed to enhance the model's fine-grained behavioural prediction capability by adding auxiliary tasks such as direction perception and coarse-grained behavioural prediction. Experimental results show that a prediction accuracy of more than 80$\%$ can be achieved using only external vehicle data. In addition, this paper fine-tunes the HDD4DBP pre-trained model on Brain4Cars, demonstrating the good generalisation ability of the method in various traffic scenarios.

(2) Based on HDD4DBP, an algorithmic framework for joint prediction of driving behaviour and trajectory based on multi-modal fusion is proposed. Specifically, firstly, this paper addresses the multi-modal data alignment problem of HDD4DBP in a targeted way, and obtains the driving trajectory based on the vehicle speed and yaw data computation in HDD4DBP. Secondly, based on the temporal prediction model PatchTST, this paper proposes a trajectory prediction method. By improving the Patch Embedding operation, the relationship modelling between trajectory and vehicle control data and local trajectory feature extraction are achieved. Further, this paper simplifies the video feature extraction module to achieve driving behaviour prediction by learning directional cues from historical and predicted trajectories and fusing them with video features. Finally, the positive impact of trajectory prediction on driving behaviour prediction performance is demonstrated through extensive experiments. Meanwhile, this paper also explores the advantages and disadvantages of different joint learning strategies and the effects of different fusion methods on driving behaviour prediction performance. The experimental results show that a prediction accuracy of 84.72$\%$ is achieved on HDD4DBP using the joint learning strategy, and $F_1$ reaches 66.75$\%$. In addition, the method not only performs well in driving behaviour prediction results, but also has a smaller number of parameters and a more efficient learning approach.

The work in this paper both creates a large-scale multi-modal benchmark dataset HDD4DBP for the research of driving behaviour prediction, constructs a new baseline for driving behaviour prediction based on Transformer, and proposes a joint prediction framework for driving behaviour and trajectory based on multi-modal fusion on the basis of this framework, which significantly improves the performance and computational efficiency of driving behaviour prediction methods, and provides a platform basis for future research on driving behaviour prediction methods to Provide larger scale data support and a platform basis for fair comparison for future research on driving behaviour prediction methods.

关键词人车共驾,驾驶行为预测,多模态协同,轨迹预测,多任务学习
学科领域人工智能
学科门类工学
语种中文
国重实验室规划方向分类多模态协同认知
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/58513
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
董清辉. 基于多模态协同的驾驶行为预测[D],2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
答辩后版本.pdf(5017KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[董清辉]的文章
百度学术
百度学术中相似的文章
[董清辉]的文章
必应学术
必应学术中相似的文章
[董清辉]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。