三维人脸稠密配准算法及其应用研究

CASIA OpenIR > 毕业生 > 博士学位论文

	三维人脸稠密配准算法及其应用研究
	范振峰
	2020-09
页数	128
学位类型	博士
中文摘要	三维人脸的稠密配准旨在对表示不同三维人脸的空间数据进行精细和稠密的匹配。良好的稠密配准是三维人脸分析的前提，其后续应用包括三维人脸重建、三维人脸识别以及三维人脸建模和仿真等等。三维人脸的稠密配准也为关于二维人脸图像的很多任务提供了新的解决方案。相比于二维人脸图像，三维人脸包含额外的几何信息，这些几何信息在人脸的不同姿态角的成像和环境光的照射下具有较好的鲁棒性，其同样有助于解决人脸表情变化引起的若干问题。总的来说，三维人脸的稠密配准有两大主要功能：（1）它使得不同三维人脸能表达成一个统一的向量形式，有利于进一步做数据分析；（2）相比于稀疏的锚点对应，稠密配准不但能刻画人脸的整体结构，而且能刻画人脸的细微结构。三维人脸稠密配准也属于点云的非刚性配准范畴，是一个重要且具有挑战性的问题。其难点在于：（1）数学意义上，相比于刚性配准，该问题并没有显式的数学表达式来进行优化求解；（2）物理意义上，相比于少数的人脸锚点，大多数稠密的点并没有确切的解剖学定义。这同样为配准结果的评价增加了不确定性。三维人脸稠密配准有着诸多的后续应用。一方面，它是三维人脸数据分析的前提，通过稠密配准建立的三维人脸可以进行统计分析，较好地稀疏建模表示新的三维人脸数据，其可直接应用于在三维人脸识别。另一方面，建立稠密配准的三维人脸有益于从根本成像原理上为二维人脸图像任务提供正确的指导方案，激发新的问题解决思路。本文从三维人脸稠密配准这一根本问题入手，分析了三维人脸模型的建立过程，也研究了其相关的三维人脸和二维人脸图像方面的应用工作。其主要内容和创新点如下：（1）提出了一种不依赖于锚点的自动三维人脸稠密配准方法。通常来说锚点需要手工标注，并且在有缺失的三维数据上难以精确定位。本文提出了一个框架，其把稠密配准从两方面定义：一是语义上的配准，使得配准的数据点在语义上保持一致；二是拓扑结构上的配准，使得配准的数据具有局部相容的表达形式。本文同时提出了用自动检测的高熵点替代手工标注的锚点，达到了配准过程的自动化。（2）提出了一种局部形状增量变形的三维人脸稠密配准方法。该方法解决了依赖于锚点变形导致局部结构不一致的问题，其把非刚性配准建模成许多个局部刚性配准的组合，且建立了显式的数学表达式。刚性配准的权重按照与关键点的距离的反比作调整，因此保证了局部结构一致的平滑变形。关键点由一系列锚点作为初始化，在配准误差大的区域逐渐迭代增加。随着关键点的增加，配准误差逐渐减小，从而达到收敛的目的。（3）研究了三维人脸模型在人脸重建和识别方面的应用。首先，本文建立了更适合于亚洲人的三维人脸模型，使得其适配于亚洲人脸图像的三维人脸重建。其次，三维人脸模型可以使得三维人脸配准的结果更加鲁棒，这可以用来配准有噪声、遮挡和大表情的三维数据。最后，可以将三维人脸配准结果直接应用于三维人脸识别，并从人脸的全局和局部的结构上分别验证识别的效果。（4）提出了用三维人脸深度信息指引二维人脸超分辨重建的方法，相关内容融合了神经网络对人脸图像做超分辨重建。二维的卷积具有平移不变性，其仅仅在感受野的范围内考虑二维的邻域信息，并没有从真实的几何结构上去求邻域关系。本文提出了一个网络结构，用 U 型网络从人脸图像中恢复深度信息，并经过若干特征提取层对超分辨重建网络进行特征调制，该网络用于人脸超分辨重建任务可以获得更加清晰和锐利的人脸边缘。（5）提出了对人脸图像低维空间进行扰动以增强数据的方法。本文首先从二维人脸图像出发，对其形状和外观进行合理的扰动，并将其用于人脸图像的超分辨重建任务。在不改变卷积神经网络结构的情况下，该方法使得重建效果有了非常显著的提升。然后本文再从三维人脸的角度重新认识这一方法，对人脸的姿态和形状进行合理的扰动，生成了同一幅人脸图像姿态和形状扰动下的不同外观表征。相比于二维的方法，三维的方法同样在不改变卷积神经网络结构的条件下使得超分辨重建效果有了更大的提升。
英文摘要	Dense registration of 3D faces seeks accurate matching and canonical representation of 3D facial data, which is fundamental in a number of downstream applications in the field of 3D facial analysis, such as 3D face reconstruction, 3D face recognition, and 3D face animation. Dense registration of 3D faces also provides clue for many vision tasks of 2D facial images. Compared to its 2D counterpart, 3D face contains extra geometric information which is stable under different poses and illumination conditions, and also can be used to solve expression variations. The benefits of dense correspondence are generally two-fold: 1) the one-to-one correspondence of points between different faces allows them to be organized in the same vector space, enabling convenience for further data analysis; 2) compared to sparse representations such as landmarks only, dense representations capture local as well as global structures of faces, providing more detailed information. Dense registration of 3D faces remains a challenging problem which belongs to the class of point cloud non-rigid registration. In the mathematical view, unlike the rigid case, the non-rigid registration problem has no explicit formulation. In the physical view, while locating landmarks on 3D faces can be guided by the common knowledge of the anatomical structures, correspondence of points on smooth regions has no solid definition. This also raises difficulties in assessing the correspondence results. Dense registration of 3D faces contributes to many applications in both the 3D and 2D cases. On the one hand, statistical analysis of 3D faces is highly dependent on the dense registration results, and sparse representation of 3D faces can be directly applied to 3D face recognition. On the other hand, the study of 3D faces provides new physical insights for solving problems of 2D facial images. This dissertation originates from establishing accurate correspondence of 3D faces, based on which the author elaborates the process of building 3D face models and studies some applications for 3D faces and 2D facial images. The main contributions of this dissertation are as follows: (1) The author proposes an automatic method for dense registration of 3D faces without landmarks. Generality the landmarks require manual annotation and are hard to define consistently across different faces with partial data. The author proposes a generally framework to revisit the dense registration problem in two perspectives. One is semantic correspondence, which guarantees that the corresponded points share the same semantic meaning. The other is topological correspondence, which guarantees that the corresponded points lie in the same local context. The high-entropy points, which are automatically detected, are employed to replace the landmarks for automatic correspondence. (2) The author proposes to boost local shape matching for dense registration of 3D faces. The proposed method alleviates the negative effect of incoherent local deformations caused by landmark guidance. The dense registration problem of 3D face is considered as many locally rigid motions with explicit formulation. More specifically, the weights for each rigid motion are adjusted according to their distances to the key points. The key points are initialized by a few landmarks, and are augmented adaptively in regions with large registration errors. The registration finally converges as the key points increase. (3) The author studies some practical issues on 3D face reconstruction and recognition based on 3D face models. First, a 3D face model which is more adaptive to Asian groups is established and applied to 3D face reconstruction. Then, a well-established 3D face model is demonstrated to benefit robust registration of 3D faces, which can deal with data with noise, occlusions, and large expressions. Finally, the corresponded results can be directly applied to 3D face recognition, demonstrating the effectiveness using both the holistic and regional structures of 3D faces. (4) The author proposes effective ways to incorporate 3D depth information for the super-resolution task of 2D facial images. The convolutional neural networks are employed for the 2D facial image super-resolution. The convolution is a translation-invariant operation which considers the 2D local receptive field. One probable predicament is that the 3D neighbors are ignored. The author proposes a network architecture with a Unet structure to learn the depth map from a facial image. The learned depth map is further fed into the modulation of features for the main super-resolution task. The proposed network leads to both quantitative and qualitative improvements for the face super-resolution task, especially for sharper details of facial edges. (5) The author proposes an effective way for data augmentation based on perturbations on the low-dimensional space of facial images. First, the author carries out the study for 2D facial images and conducts reasonable perturbations on shape and appearance. This is applied to the super-resolution task of facial images and the results show notable improvements without altering the basic structures of the convolutional neural networks. Then, the author carries on the study for 2D facial image in a 3D perspective. 3D facial pose and shape are perturbated to generate novel appearances of a single 2D facial image. The 3D method gains some improvements over the 2D method also without altering the network structures for the face super-resolution task.
关键词	三维人脸模型非刚性配准稠密对应人脸图像超分辨重建卷积神经网络低维空间扰动
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/40564
专题	毕业生_博士学位论文
通讯作者	范振峰
推荐引用方式 GB/T 7714	范振峰. 三维人脸稠密配准算法及其应用研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
范振峰毕业论文提交版.pdf（52320KB）	学位论文		限制开放	CC BY-NC-SA