英文摘要 | With the rapid development of depth sensors, such as Kinect, it becomes a reality to acquire simultaneous RGB and depth image. Kinect, due to its low cost and easy installation, has already been a standard tool in solving traditional vision problems such as 3-D object tracking and scene understanding. However, the RGB-D images acquired from Kinect often contain data holes around the boundaries of objects, and some other types of data noise. Moreover, due to sensor shaking, blur effect exists commonly on the acquired RGB image. This thesis presents a set of new techniques for tracking 3-D object from RGB-D video. Firstly, new depth-inpainting methods are developed for refilling the holes in depth map, and meanwhile, new motion-deblurring methods are developed for removing blur in RGB image. Secondly, we present a new 3-D head pose tracking algorithm with RGB-D video, and develop two homography estimation approaches for planar object tracking. More specifically, the main contributions of the thesis include: For preprocessing depth-map, we propose two depth restoration approaches, one is based on a novel TV21 regularization while the other fuses graph laplacian with the TV21 regularization. The first approach assumes RGB patch is linearly correlated with its depth correspondent, based on which an energy minimization framework was proposed for refilling missed depth values. TV21 is integrated in the approach for enhancing object boundary and textured details. With the same TV21 regularization, the second approach integrates graph Laplacian for seeking better depth refilling. An efficient coordinate gradient approach was developed to solve the two approaches. For preprocessing RGB image, we propose two kernel-size estimation approaches. One is based on calculating the auto correlation map (automap) of the blurry input image while the other is a learning-based approach. In the first approach, a modified automap is defined, which could eliminate ray effect that is caused by broad-line structures. With the modified automap, only the correlation values that truly reflect motion-kernel information will be preserved. The second approach is inspired by the fact that a blurry image has similar gradient distributions to its sharp correspondent at the low level of their pyramids, but the similarity disappears at higher levels. By learning models that associates low- and high- level gradient distributions, this approach could accurately estimate the blur-kernel size o... |
修改评论