Visual estimation of head pose is essential for many applications such as face recognition and human-computer interaction. Driven by its wide applications, head pose estimation has drawn great attention from academia, and a variety of techniques have been reported in the literature. However, accurate and efficient estimation of head pose is still a grand challenge in computer vision. And it is difficult to establish a stable relationship between the complex and variable face patterns and head poses because of both internal factors such as expression variations and external factors such as illumination changes. In this thesis, we take Kinect, which can simultaneously capture a pair of depth image and RGB image, as image sensor to research on a real-time head pose estimation system. The key problems of head pose estimation covered by this thesis include image preprocessing and feature extraction. In order to deliver robust and distinctive features for head pose estimation, we try to design a general feature design framework for depth image representation. In particular, the main contributions of this thesis are summarized as follows: ① This thesis designs a novel face detection and segmentation method by fusing the characters of both depth images and RGB images captured by a Kinect sensor. Comparing to the traditional methods, this method can fast and accurately detect the faces with large pose variations. ② This thesis proposes a novel and general model (Depth Slice Pose Perception Model, DSPP), which is instructed by the powerful ability of human on processing depth information, for depth images representation. DSPP divides an entire depth image into a series of slices to describe the poses of an object. In this model, the mechanic of processing depth information is closed to human's visual system, which makes this model robust against noises. Furthermore, this model is flexible for designing new features. If you can find a new feature to describe a slice of depth image, then the new feature can be extended to describe the entire depth image. Finally, this model provides a general framework for the representation of depth images, and provides a theoretical method for designing new features for estimating the poses of an object. ③ This thesis proposes three sets of novel features, which are integral slice center (ISC) descriptor, local slice depth (LSD) descriptor and local slice orientation (LSO) descriptor, based on depth slice pose perce...
修改评论