人脸图像对齐相关问题研究

CASIA OpenIR > 毕业生 > 博士学位论文

	人脸图像对齐相关问题研究
	李琦1,2
	2016-05-29
学位类型	工学博士
中文摘要	随着现代社会科技的进步与发展，人员的交往和流动日益频繁，安全问题也越来越得到人们的重视。在各种安全问题中，身份的认证和识别成为了现代安全的核心问题，人们几乎时刻都需要鉴别别人的身份和证明自己的身份。一些传统的身份识别主要基于和身份相关物品（如护照、身份证等）和身份相关知识（如用户名、密码等）来识别身份。而基于生物特征的身份识别相对于传统的身份识别具有稳定性、便携性等特点。在各种生物特征识别中，人脸识别相比其他生物特征识别方法具有直接、友好、方便的特点，因此人脸识别成为了近些年身份识别研究的热点问题。一般人脸识别系统包括：检测、跟踪、对齐、识别等步骤，人脸检测是判断图像或视频中是否存在人脸，如果存在则给出人脸位置。检测出人脸之后一般还需要对人脸进行跟踪，即确定帧间对齐变换参数，把前一帧人脸图像位置变换到当前帧位置，以更加鲁棒、快速的定位人脸。人脸对齐指的是把人脸图像变换矫正到正面脸图像。人脸对齐在本论文中被分为无监督和有监督方式。其中无监督对齐指的是不需要指定的参考点，自动对一批图像进行对齐。有监督对齐其核心内容在于检测出人脸关键点，然后根据关键点计算相应的变换参数，之后对人脸图像进行对齐。人脸识别指的是对矫正后的图像抽取相应的特征并且选择合适的度量距离进行比对，判断人的身份属性等。由于现实生活中人脸图像越来越多，对一批人脸图像进行管理时候往往需要把图像自动的分成不同的类别，以方便后续查找等需求。人脸图像聚类可以为人脸图像预先打上一定的标签，降低人工标定、鉴定识别的工作量，人脸聚类也可以对识别的数据库进行分类，提高检索速度，用于人脸视频检索，人脸视频内容分析等。本文所研究的问题以人脸对齐为中心，涉及到图像对齐和聚类、关键点检测以及视频跟踪等内容。本文的主要贡献如下：提出了基于低秩约束的同时对齐和聚类模型。相关的研究表明图像的对齐和聚类是相互关联的两个任务，同时解决这两个任务有助于提升各自的性能。在子空间分割基础上，我们提出了基于低秩约束的同时对齐和聚类算法，该算法把图像对齐成功的引入到传统的子空间聚类算法中，用同一个目标函数解决了图像的对齐和聚类问题。所述目标函数用迭代的增广拉格朗日方法求解，在标准图像集上与不同方法对比证明了所提方法的优越性。提出了基于形变不变性的聚类方法。在对齐样本的同时学习相应的子空间表示，使得所提出的聚类方法对图像平面变换比较鲁棒。通过对齐，变换后的图像变得高度相关，因此可以获得更好的一个相似性矩阵。联合学习问题最后转化为了一些列的较好求解的最小二乘问题。同时还证明了经典的Least Squares Congealing人脸对齐方法是我们方法的一种特殊形式。在非可控环境下的真实数据集上的结果显示，我们的方法要优于现有的一些子空间聚类方法和其他的一些同时对齐和聚类的方法。提出了基于多任务自编码器的两阶段人脸关键点检测模型。在传统单一任务关键点检测基础之上，同时进行关键点检测和姿态估计。在第一阶段，多任务自编码器进行关键点的粗定位，在第二阶段，多任务自编码器进行关键点的精定位。两阶段的自编码器保留了人脸形状信息，从粗到精的精细化定位关键点。所提方法比其他基于深度学习的关键点检测算法复杂度低，运行时间短。在有挑战性的数据库上的实验显示了我们方法的有效性。提出了基于全局级联卷积神经网络关键点检测方法。所提出算法考虑了关键点之间相互影响，把不同部位关键点周围提取的区域输入到一个卷积神经网络之中，通过卷积神经网络自动学习关键点之间关系。算法利用了形状约束信息，在卷积神经网络数目较少情况下取得了较好的定位效果。在非可控环境数据库上的测评表明，所提出算法适用于比较有挑战性的关键点检测任务，对光照、遮挡等比较鲁棒。提出了基于相关熵的鲁棒视频跟踪算法。针对传统的在线子空间跟踪算法对噪声比较敏感，用相关熵对非高斯噪声进行建模，提出了基于相关熵的鲁棒视频跟踪方法，所提跟踪方法对光照、遮挡、运动污染等比较鲁棒。由于目标函数的非凸性质，采用迭代的半二次最优化方法求解。另外根据信息论的相关知识，设计了一个新颖的在线更新模板。在公开的数据库上的测评验证了我们的跟踪算法比其他一些主流跟踪算法效果要好。综上所述，我们以人脸对齐为主线，深入分析了图像的对齐和聚类、关键点检测、视频跟踪等存在的问题，并提出了相应的解决方案，提升了人脸图像的对齐和聚类性能，提高了关键点检测和视频跟踪的精度，有助于提升人脸识别系统的性能。
英文摘要	With the development of science and technology in modern society, communication and movement among people become popular. Security issues become more and more important. Among all of the security issues, identity authentication and recognition become the core ones. Almost all the time people need to identify the identity of other people and prove their own identity. Traditional identity authentication and recognition methods include using the identity related items (such as passports, ID cards, etc.) or identity-related knowledge (such as user names, passwords, etc.). Compared to traditional identity authentication and recognition methods, biometric recognition has the characteristics of stability and portability. Compared with other biometric recognition methods, face recognition has direct, friendly and convenient features. Thus it becomes a hot topic in recent years. A general face recognition pipeline usually consists of several parts: face detection, face tracking, face alignment, face recognition. Face detection aims to determine whether there is a face in the image or video sequences. If there exists a face, return the facial positions. Face tracking is used to track the face images. Transformation parameters are calculated to predict the current facial position robustly and rapidly. Face alignment refers to aligning the face images to frontal ones. In this paper, face alignment is divided into supervised face alignment and unsupervised face alignment. Unsupervised face alignment can align the face images automatically without specific reference points. The core idea of supervised face alignment is to find the facial landmark locations. And then calculate the transformation parameters based on the facial landmark locations. Then we can align face images to frontal ones. Face recognition refers to extracting features from the frontal face images and selecting the appropriate distance measure for final classification. Because face images become more and more popular, they often need to be automatically divided into different categories. Face clustering is a technique that can automatically mark face images with certain labels. It can reduce the effort to manually calibration and recognition the face images. Face clustering can also classify the face recognition database automatically and thus improve the speed for face video retrieval and face video content analysis. Main contributions of the thesis are summarized as follows: We propose a simultaneous image alignment and clustering algorithm using the Low-Rank Representation. Related studies have shown that image alignment and clustering are two correlated tasks. Addressing these two tasks together helps to improve their performance. Based on the subspace clustering algorithms, we propose a novel joint alignment and clustering algorithm by integrating spatial transformation parameters and subspace clustering parameters into a unified objective function. We can solve the proposed function by linearizing the objective function, and then iteratively solving a sequence of linear problems via the Augmented Lagrange Multipliers method. Experimental results on various data sets validate the effectiveness of our method. We propose a transformation invariant subspace clustering framework by jointly aligning data samples and learning subspace representation. Hence our algorithm is robust to image transformation. By alignment, the transformed data samples become highly correlated and a better affinity matrix can be obtained. The joint problem can be reduced to a sequence of Least Squares Regression problems, which can be efficiently solved. In addition, we show that the Least Squares Congealing algorithm is a special case of our framework. We verify the effectiveness of the proposed method with extensive experiments on unaligned real data, demonstrating its higher clustering accuracy than the state-of-the-art subspace clustering and other transformation invariant clustering algorithms. We propose a two-stage multi-task Auto-encoders framework for fast face alignment by incorporating head pose information to handle large view variations. In the first and second stages, multi-task Auto-encoders are used to roughly locate and further refine facial landmark locations with related pose information, respectively. Besides, the shape constraint is naturally encoded into our two-stage face alignment framework to preserve facial structures. A coarse-to-fine strategy is adopted to refine the facial landmark results with the shape constraint. Furthermore, the computational cost of our method is much lower than its deep learning competitors. Experimental results on various challenging datasets show the effectiveness of the proposed method. We propose a global cascaded Convolutional Neural Networks for face alignment. The proposed algorithm considers the interaction between different facial landmarks. Different patches around the facial landmarks are sent into one Convolutional Neural Network to learn the relationship between the facial landmarks automatically. The proposed algorithm takes the advantage of the shape constraint information, and achieves good performance with a relatively smaller number of Convolutional Neural Networks. Experimental results have shown that the proposed algorithm is suitable for the challenging facial landmark detection task, and is robust to illumination, occlusion, etc. We propose a correntropy based robust holistic tracking algorithm to deal with various noises. Then half-quadratic algorithm is carefully employed to minimize the correntropy based objective function. Based on the proposed information theoretic algorithm, we design a simple and effective template update scheme for object tracking. Experimental results on publicly available videos demonstrate that the proposed tracker outperforms other popular tracking algorithms. In a word, face alignment is regarded as the main line of the thesis. We have analyzed the simultaneous alignment and clustering algorithm, the facial landmark detection algorithm, the visual tracking algorithm. We have proposed the corresponding solutions for solving these issues and boosted the performance of simultaneous alignment and clustering algorithm, improved the accuracy of facial landmark detection algorithm and enhanced the performance of visual tracking. All of these solutions will help improve the performance of face recognition system.
关键词	人脸图像对齐图像对齐和聚类关键点检测多任务自编码器全局级联卷积神经网络视频跟踪
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/11677
专题	毕业生_博士学位论文
作者单位	1.中国科学院自动化研究所 2.中国科学院大学
第一作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	李琦. 人脸图像对齐相关问题研究[D]. 北京. 中国科学院大学,2016.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
最终版本.pdf（23004KB）	学位论文		限制开放	CC BY-NC-SA