CASIA OpenIR  > 毕业生  > 博士学位论文
基于三维辅助的人脸识别
朱翔昱
学位类型工学博士
导师李子青
2017-05-26
学位授予单位中国科学院研究生院
学位授予地点北京
关键词人脸识别 三维人脸模型 人脸对齐 姿态表情校正 基于三维辅助的人脸识别
摘要
人脸识别是机器根据人脸信息自动确认人身份的技术,是计算机视觉和模式识别的核心研究领域之一,在监控、金融、刑侦、人机交互等领域有着广泛的应用前景。经过多年发展,人脸识别的关注场景逐渐从用户配合的受控场景发展到非用户配合甚至隐蔽条件下的不可控场景。人脸识别也在新的应用场景下遇到了许多挑战,如姿态变化、表情变化、光照、遮挡,低分辨率等。这些因素都会使人脸表象的类内变化大于类间变化,降低人脸识别系统的性能。其中,姿态和表情由于出现频率高、干扰性大,一直是制约人脸识别实用化的两个巨大障碍。
 
本文对人脸识别问题中的姿态和表情变化进行了深入研究和分析,提出三维技术是解决这些问题的一个强有力方法。由于姿态变化来源于人脸在三维空间内的旋转的而表情变化来源于人脸的三维非刚性形变,使用三维技术显式地对姿态和表情变化进行建模并校正将比传统的二维方法更加直观有效。从人脸二维图像中估计出三维信息并使用三维信息对干扰因素进行校正,进而辅助人脸识别系统,本文称之为基于三维辅助的人脸识别。基于三维辅助的人脸识别包含两个预处理步骤:其一为三维人脸模型拟合,其二为三维人脸校正。本文也从这两个方面入手,针对现有方法中的各种缺陷,提出了相应的解决办法,主要的工作和贡献有:
 
1. 三维人脸模型拟合方法采用分析-合成框架,其仅使用图像像素作为拟合特征,导致在非受控场景下的拟合过程极易陷入局部极值而得到次优解。本文提出了基于稀疏SIFT流的拟合方法,使用更加鲁棒的SIFT特征作为拟合特征,并针对人脸这一特殊物体对SIFT匹配算法SIFT流进行优化,将SIFT流限制在稀疏的人脸纹理丰富区域,极大减少了运算量。当嵌入到分析-合成框架中作为额外的拟合约束时,稀疏SIFT流几乎不占用拟合时间。
 
2. 现有拟合方法的一个重要缺陷是拟合速度过慢,处理一张人脸图像通常需要一分钟以上,严重限制了应用范围。受人脸关键点定位相关成果的启发,本文提出使用基于回归的方法代替传统的分析-合成框架,在人脸模型的三维关键点处提取HOG特征,并通过级联回归的方式逐步更新模型参数直至收敛。将``模拟图像生成-损失函数优化''的拟合过程用``提取图像特征-级联回归''的拟合过程替代,极大减少了拟合所需的迭代次数以及每次迭代所需的时间。
 
3、 大姿态(偏航角大于45度)下的三维人脸模型拟合一直是一个难以解决的挑战。首先,自遮挡会导致部分关键点不可见,使得在大姿态下无法依赖于二维关键点约束或三维关键点处提取的图像特征。其次,当存在大姿态时人脸的表象变化更加复杂,甚至会发生从正脸到侧脸的结构变化,这对拟合方法提出了更高的要求。最后,在大姿态下标定训练样本过于困难导致拟合算法的训练缺乏数据。本文提出了一种新的级联卷积神经网络结构从人脸图像直接回归出三维人脸模型参数。该方法提出了两种新的拟合特征以将若干卷积神经网络级联起来构成一个强大的回归器,并提出了一种新的损失函数以描述模型参数的优先级,使拟合过程倾向于更加重要的参数。针对数据缺乏的问题,本文提出了一种人脸侧面化算法,其在三维人脸模型的辅助下,可以对中等姿态的训练样本进行面外旋转,虚拟生成大量大姿态下的逼真训练样本。
 
4. 拟合出的三维人脸模型可以提供额外的三维信息对人脸图像进行姿态和表情校正。针对现有校正方法会丢失人脸身份信息的问题,本文提出了一种高保真人脸姿态表情校正方法,可以将任意姿态、任意表情的人脸图像生成其正面无表情的图像。具体来说,给定检测好的关键点,该方法首先使用一种新关键点漂移算法拟合三维人脸模型,然后使用三维面片化技术将整张人脸图像转化为一个三维物体,并进行面外旋转和非刚性形变完成姿态表情校正。最后根据人脸的对称性对自遮挡区域进行光照自适应的无缝填补,生成完整的校正结果。与传统方法不同的是,该方法保留了人脸周边区域和自遮挡区域,极大保留了原始人脸的身份信息,提高了校正的保真度。
 
总的来说,本文针对三维人脸模型拟合和三维人脸校正两个方向进行了深入的研究,提出了多种有效方法提升算法性能,推动了基于三维辅助的人脸识别的发展,提升了人脸识别算法在不可控场景下的鲁棒性。
其他摘要
The objective of face recognition is to enable computer to identify a person from his face data. It is a central problem in computer vision and pattern recognition, and has been widely applied in surveillance, authentication, security check and human–computer interaction. In the past decade of development, face recognition has progressively changed its attention from user-cooperated constrained environment to non-cooperated non-contacting unconstrained environment. In this new challenging environment, various adverse factors such as pose, expression, illumination, occlusion and low resolution dramatically increase the intra-class variations of faces to exceed inter-class variations, making existing face recognition systems no longer work well. Among them, pose and expression have long been the central problems due to their high frequency and strong interference.
 
We find 3D techniques are very suitable for dealing with pose and expression variations. Since pose comes from face rotation in 3D space and expression comes from 3D face morphing, using 3D information to model and normalize pose and expression is inherently more intuitive and accurate. In this thesis, we concentrate on estimating 3D information from a single face image and normalize the pose and expression through 3D techniques to improve face recognition accuracy, which is called 3D-aided face recognition.
There are two steps in 3D-aided recognition: 3D face model fitting and 3D face normalization. In both tasks, we analyze the drawbacks of existing methods and propose several methods to overcome them. The main contributions are shown as follows.
 
1. Traditional 3D face model fitting methods adopt the analysis-by-synthesis framework, which tends to fall in local minima and gets sub-optimal results in unconstrained environment. The main reason is that it only adopts image pixels as fitting feature. In this thesis, we propose to adopt SIFT as a more robust fitting feature and propose a new fitting method named sparse SIFT flow. Specifically, some salient points are labelled on the texture-rich regions of the face model and their positions on the image are searched with SIFT flow. The 2D-3D correspondence is regarded as an additional fitting constraint to improve fitting accuracy. Since SIFT flow only runs on the sparse texture-rich regions, it brings little computation cost when embedded into the analysis-by-synthesis framework.
 
2. A main drawback of traditional analysis-by-synthesis framework is its poor efficiency, it always needs more than one minute to fit a single face image, which seriously limits its application. Inspired by the achievements of face alignment, we propose to replace the analysis-by-synthesis framework with the regression framework. Instead of the traditional "face simulation and minimization'' fitting process the new framework adopts the "feature extracting and regressing'' process to estimate model parameters, which is much more efficient and can converge at less than one second.
 
 
3. 3D face model fitting in large poses (where the yaw angle extends 45 degrees) is very challenging especially in unconstrained environment. Firstly, in large poses, model fitting can not rely on landmark constraints or landmark based image features due to self occlusion. Secondly, the large poses require the fitting method to handle dramatic appearance variations from front to profile. Finally manual labelling large-pose faces is very tedious, leading to the lack of training data. In this thesis, to solve the large pose challenge, we propose a cascaded convolutional neural network to regress raw image pixels to model parameters, where two novel input features are proposed to connect multiple convolutional neural networks and a novel cost function is proposed to model the priority of model parameters during training. Besides, we propose a face profiling method to synthesize large scales of training samples across large poses.
 
4. The fitted 3D face model can provide additional 3D information to normalize pose and expression for face recognition. However, in the normalization process most existing methods do not preserve the identity information well and bring in much artifact, which seriously deteriorates recognition performance. In this thesis, we propose a high-fidelity pose and expression normalization method, which can automatically generate a natural face image in frontal pose and neutral expression. Given a group of detected landmarks, firstly the 3D morphable model is fitted based on a novel landmark marching algorithm, then the whole face image is transformed into a 3D object and rotated to frontal view with a 3D meshing and frontalization process, finally the self-occluded region is filled seamlessly with illumination adaptive filling algorithm, leading to the final normalization result. Different from traditional normalization methods, our method preserves the identity information by keeping the self-occluded region and the external face region, improving the face recognition accuracy by a large margin.
 
In summary, in this thesis, we have made a lot of significant progresses on 3D face model fitting and 3D face normalization, so as to improve the performance of face recognition in unconstrained environment.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/14786
专题毕业生_博士学位论文
作者单位中国科学院自动化研究所,模式识别国家重点实验室
推荐引用方式
GB/T 7714
朱翔昱. 基于三维辅助的人脸识别[D]. 北京. 中国科学院研究生院,2017.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
mythesis-submit.pdf(33589KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[朱翔昱]的文章
百度学术
百度学术中相似的文章
[朱翔昱]的文章
必应学术
必应学术中相似的文章
[朱翔昱]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。