RGB-D视频处理与三维目标跟踪

CASIA OpenIR > 毕业生 > 博士学位论文

	RGB-D视频处理与三维目标跟踪
其他题名	RGB-D Video Processing and 3-D Object Tracking
	刘绍国
	2014-05-27
学位类型	工学博士
中文摘要	以微软Kinect为代表的深度传感器技术正在快速发展，使得同步获取稳定的彩色和深度图像成为了一种现实。由于价格低廉、操作方便，Kinect在计算机视觉，特别是涉及三维信息处理如三维目标检测与跟踪、三维场景解析等问题中，有着广泛的应用前景。遗憾的是，Kinect提供的RGB-D视频并不很完善，其深度图像在物体边缘处会有大量的数据缺失(无法探测到深度值)，在其它任意位置也经常存在各种各样的数据噪声，另外，由于传感器的震动，其彩色图像也往往存在一定程度的运动模糊。本论文针对如何利用RGB-D视频进行三维目标跟踪这一新的计算视觉问题进行了深入系统地研究。首先，针对RGB-D视频数据存在的问题，提出了新的深度图像复原和运动去模糊等数据预处理方法；其次，在此基础上，提出了两种新的基于单应矩阵估计的三维平面目标跟踪方法和一种利用RGB-D视频进行实时三维头部姿态跟踪的新方法。具体来说，论文的主要贡献包括：针对深度数据缺失和噪声问题，分别提出了基于TV21正则的能量最小化和融合TV21正则化与拉普拉斯图的两种深度图像复原方法首先，通过挖掘局部图像区域内深度和彩色图像的线性相关性，提出了一种最小化TV21正则能量函数的深度图像复原框架。该方法利用了全方差21范数(TV21)对梯度稀疏性进行建模，以更好地复原深度物体边缘和细节；在此基础上，融合了拉普拉斯图方法来更好地复原深度图像，并提出一种高效的优化方法来求解相应的目标函数。针对彩色图像存在运动模糊这一问题，分别提出了基于模糊图像自相关图和基于图像金字塔梯度统计信息的运动模糊核大小自动估计的新方法。在第一种方法中，提出了一种改进的自相关图计算方法，消除模糊图像中长直线结构导致的狭长自相关噪声，使自相关图能够真实反映运动模糊核的几何边界；还提出了一种基于学习的模糊核大小估计方法，利用模糊图像和清晰图像在低分辨率下梯度分布类似、而在高分辨率下则显著不同这一特性，通过预先学习金字塔结构下图像梯度分布和运动模糊核大小之间的关联模型，有效地估计引起图像模糊的运动核大小，并以此作为重要的输入参数，通过估计模糊核轨迹实际地去除运动模糊。提出了一种基于最大相关熵规则的RGB-D三维头部姿态跟踪的新方法。以可见彩色光流和深度流计算为基础，该方法引入了最大相关熵规则作为姿态估计的损失函数，打破了传统方法中噪音是高斯这一假设，使得光流可以更好地处理跟踪过程中遇到的光照变化、遮挡，大尺度和大角度运动等复杂情况；并提出了一种高效的半二次优化技术，解决了最大相关熵计算复杂度偏高的问题。针对三维平面目标，提出了在二进制描述子匹配的基础上，融合一阶和二阶图匹配进行单应矩阵估计的新方法。首先，为了克服传统特征描述子匹配代价过高的问题，提出了匹配快速的二进制描述子来估计单应矩阵的方法，更进一步，把该问题转换成了一个图匹配问题，结合一阶图和二阶图，分别对关键特征点的一阶和二阶邻域信息进行建模，并作为一种新的代价函数优化二进制描述子的匹配；为了加快处理速度，利用了稀疏性约束构建一阶和二阶图，以保证图匹配问题可以得到快...
英文摘要	With the rapid development of depth sensors, such as Kinect, it becomes a reality to acquire simultaneous RGB and depth image. Kinect, due to its low cost and easy installation, has already been a standard tool in solving traditional vision problems such as 3-D object tracking and scene understanding. However, the RGB-D images acquired from Kinect often contain data holes around the boundaries of objects, and some other types of data noise. Moreover, due to sensor shaking, blur effect exists commonly on the acquired RGB image. This thesis presents a set of new techniques for tracking 3-D object from RGB-D video. Firstly, new depth-inpainting methods are developed for refilling the holes in depth map, and meanwhile, new motion-deblurring methods are developed for removing blur in RGB image. Secondly, we present a new 3-D head pose tracking algorithm with RGB-D video, and develop two homography estimation approaches for planar object tracking. More specifically, the main contributions of the thesis include: For preprocessing depth-map, we propose two depth restoration approaches, one is based on a novel TV21 regularization while the other fuses graph laplacian with the TV21 regularization. The first approach assumes RGB patch is linearly correlated with its depth correspondent, based on which an energy minimization framework was proposed for refilling missed depth values. TV21 is integrated in the approach for enhancing object boundary and textured details. With the same TV21 regularization, the second approach integrates graph Laplacian for seeking better depth refilling. An efficient coordinate gradient approach was developed to solve the two approaches. For preprocessing RGB image, we propose two kernel-size estimation approaches. One is based on calculating the auto correlation map (automap) of the blurry input image while the other is a learning-based approach. In the first approach, a modified automap is defined, which could eliminate ray effect that is caused by broad-line structures. With the modified automap, only the correlation values that truly reflect motion-kernel information will be preserved. The second approach is inspired by the fact that a blurry image has similar gradient distributions to its sharp correspondent at the low level of their pyramids, but the similarity disappears at higher levels. By learning models that associates low- and high- level gradient distributions, this approach could accurately estimate the blur-kernel size o...
关键词	Rgb-d视频处理深度图像复原模糊核大小估计图匹配单应矩阵估计三维头部姿态估计 Rgb-d Video Processing Blur-kernel Size Estimation Graph Matching Homography Estimation 3d Head Pose Estimatio
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6620
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	刘绍国. RGB-D视频处理与三维目标跟踪[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20111801462804（4894KB）			暂不开放	CC BY-NC-SA