Compared with still images, video image sequences are composed of multiple dynamic images and are abundant in temporal and spatial information. These properties have made most researchers believe that face detection and recognition from video is more promising than those from still images. However, there exist a number of difficulties in video, e.g., poor quality of video images, partial occlusion, large variations of image resolution, illumination and head pose, and so on. A number of open theoretical and practical problems remain to be solved. In this thesis, we put emphasis on how to make full use of both temporal and spatial information in video, and exploit such information to facilitate liveness detection, face detection and face recognition. The contributions of this thesis include: (1) Investigate the current approaches for liveness detection, and propose a novel method based on the analysis of eye blink. The method is established based on the observation that for the live faces, the edges along some scales and orientations vary consistently with eye blink. Compared with other approaches, our method is more reliable because of the fusion of multiple detectors for eye blink detection; (2) Propose a framework for face recognition in video by fusing the recognition results of the selected key frames. Especially, an algorithm named 2DBayes is introduced. Experimental results have showed that as long as good frames and recognition algorithms are selected, the framework can obtain good recognition performance though it does not take temporal information in video into account; (3) Present a detailed discussion on video-to-video algorithms. These algorithms can be typically divided into sequential approach and batch approach. The main problems of the batch methods are heavy computational load and coarse established models. To tackle these problems, using some dimensionality reduction techniques, a batch model named video signature is proposed to represent video data in feature space, and the similarity between two video signatures is measured by Earth Mover’s Distance; (4) Propose a novel approach to using GMM updating to solve the problems of face tracking and recognition in video. At first, by considering the differences between face tracking and recognition, two initial GMMs are designed for tracking and recognition purposes, respectively. Then, both models are updated with some online incremental learning algorithm so as to improve the tracking capability and obtain class-specific GMM. Finally, Bayesian inference is introduced into the recognition framework to accumulate the temporal information in video. Experimental results have demonstrated the effectiveness of this approach.
修改评论