面向平行驾驶的语义视差深度学习计算方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	面向平行驾驶的语义视差深度学习计算方法研究
	沈宇
	2018-05-24
学位类型	工程硕士
英文摘要	双目立体视觉已经被研究了很多年，在实际中图像的深度信息是非常重要的。它为诸如先进辅助驾驶系统（ADAS）以及无人驾驶系统等高等级的挑战的研究提供了基础。深度的提取主要基于立体匹配的准则，给定左右图像对，利用对应像素的匹配关系可以求得像素点的深度信息。深度与视差成反比关系，因此立体匹配的目标就是提供精确地致密的视差。另一方面，语义分割作为视觉理解的一个重要的方面也发挥着重要的作用。由于大规模数据集的不断出现，尤其是在深度学习的背景下，物体检测的精确度在不断提高。语义分割的目标是为图像中的每个像素点分配一个类别，这在自动驾驶中也扮演着重要的角色。深度卷积神经网络的成功使得像素级的语义分割在精度上取得巨大的提高，这一切要归功于丰富的层级特征和端对端训练的框架。监督式语义分割网络通常专注于将比较深层次的全卷积网络应用到其中。尤其是101层的ResNet-101网络在PASCAL VOC2012数据集上将平均交叉区域（mIoU）提高到新的层级。因为越深的网络通常可以提取更加具有判别里的特征，因而可以更好地区分不同的种类。然而，语义分割与视差计算目前尽管在各自领域取得很大的成就，但是很少会有将二者联系起来计算的。因此，本文提出一种将语义分割和视差计算结合起来使用深度卷积神经网络计算的语义视差网络方法（Seg-Disp）： 1、设计Seg-Disp网络的视差计算部分。使用沙漏型的卷积神经网络DispNet作为视差提取部分，将该网络与语义分割分支ResNet-101并联。提取视差信息的融合部分将语义分割的中间特征与相应分辨率的视差特征进行级联、裁剪并做上采样操作，最后以端点误差EDE（End-Point-Error）作为Loss反传。使用在视差数据集monkaa预训练好的语义部分权值初始化语义分支，并使之固定，输入Cityscapes数据集的视差数据集，更新视差部分权值直至收敛。 2、设计Seg-Disp网络的语义分割部分。使用上述结构，同样将语义分支和视差分支相应分辨率的中间特征值以级联、裁剪的方式交叉融合，以Softmax-Loss反传更新，利用上面得到的预训练好的视差部分权值初始化Disparity部分，使之固定，在Cityscapes数据集上更新语义分割部分权值直至收敛。如此循环，直至网络最终收敛。 3、介绍平行驾驶的概念，阐述了平行驾驶作为无人车上路的安全高效途径，并点明语义视差网络在平行驾驶的测试系统中有广泛的应用。 ; Binocular Stereo Vision has been studied for many years. The depth information of a picture is very important in practice. It provides an important basis for numerous higher level challenges such as advanced driver assistance and autonomous systems. Given a left-right image pair, depth can be estimated by matching corresponding pixels. As depth is inversely proportional to disparity, a stereo matching system is targeted to produce an accurate dense disparity instead. On the other hand, Semantic segmentation act as one of the aspect of Visual Understanding has shown its importance in Scene Understanding. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. Semantic segmentation aims to assign a categorical label to every pixel in an image, which plays an important role in self-driving system. The recent success of deep convolutional neural network models has enabled remarkable progress in pixel-wise semantic segmentation tasks due to rich hierarchical features and an end-to-end trainable framework. Improvements on fully-supervised semantic segmentation systems are generally focus on applying deeper FCN models. The introduction of a 101-layer ResNet-101 achieved a significant gain in mean Intersection over Union (mIoU) scores on PASCAL VOC2012 datasets. Since deeper networks generally can model more complex representations and learn more discriminative features that better distinguish among categories. Although great achievement has been made in the area of Semantic Segmentation and Disparity estimation respectively. However, they have never been combined to do research which is necessary just as we human do when recognize the outside world. So we propose a new architecture calling Seg-Disp which compute Disparity estimation and Semantic Segmentation jointly using deep convolutional neutral network: Design the Disparity estimation part of Seg-Disp: We use the hour-glass structure DispNet as our baseline to compute disparity and concatenate it with the segmentation branch calling ResNet-101. After that, the extracted disparity feature will be fused with features extracted from the semantic branch which has the same size using concatenate or crop and up-sampling, and we use End-Point-Error as our Loss to update our net. The pretrained caffemodel which is achieved from the monkaa dataset will be used to initialize the disparity branch and will not be updated during training. We input the semantic part of Cityscapes dataset to update the disparity branch until convergence. Design the semantic segmentation part of the Seg-Disp: Using the same structure in the front-end as mentioned above, and fuse inter feature the same way, what is different is the Loss function. We use Softmax-Loss as our Loss function to update the network. This time we use the updated caffemodel to initialize the disparity branch and update weighs in the semantic part of Cityscapes dataset. Doing the cycle until the joint network achieves convergence. Introduce the conception of Parallel Driving and elaborate that Parallel Driving can be an efficient way for unmanned vehicles to drive safely on real world way, then we introduce the meaning of our SegDisp in parallel driving which can be reflected in parallel testing.
关键词	语义分割视差计算平行驾驶
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/21005
专题	毕业生_硕士学位论文
作者单位	中科院自动化所
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	沈宇. 面向平行驾驶的语义视差深度学习计算方法研究[D]. 北京. 中国科学院研究生院,2018.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
面向平行驾驶的语义视差深度学习计算方法研（3908KB）	学位论文		限制开放	CC BY-NC-SA