基于透视投影的三维人脸重建及姿态估计

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于透视投影的三维人脸重建及姿态估计
	徐淼
	2024
页数	68
学位类型	硕士
中文摘要	随着深度学习的发展，虚拟穿戴、虚拟换妆、视频编辑、动画制作以及疲劳识别等技术不断得到改进和完善，许多手机和计算机的应用需求也跟着不断增加，人脸重建以及人脸的六自由度姿态估计最近也在计算机视觉和计算机图形领域引起了广泛地关注。目前，国内外研究人员提出了众多从单张RGB图像重建出三维人脸模型的方法。这些重建方法大都采用正交投影来近似代替真实的透视投影，忽略了人脸在相机坐标系下近大远小的特点，当人脸的大小与到相机的距离很小时，效果很好。然而随着自拍、虚拟眼镜试戴和化妆等技术的流行，面部捕捉的场景变得更加复杂，通过正交投影重建出的人脸会出现明显的失真现象，因此估计透视投影变得十分重要。估计人脸姿态也常常作为子任务出现在这些方法中，但是这些方法得到的姿态仅仅是人脸的朝向，缺少了人脸相对于相机的偏移量，这也进一步限制了其应用的能力和便捷性。本文主要研究基于透视投影的人脸重建以及人脸的六自由度位姿估计，解决以往重建方法存在的失真情况，同时准确估计人脸姿态使其可以灵活应用于复杂场景。本文的主要贡献和成果归纳如下：针对基于正交变换重建出的人脸出现明显失真以及以往人脸位姿估计仅关注旋转角的问题，本文引入了人脸六自由度位姿的重建方案，探索了利用单张RGB图像进行透视投影下的三维人脸重建，仅通过一个网络在重建出世界坐标系下的三维人脸同时估计人脸的六自由度位姿，提高了人脸重建精度的同时优化位姿估计的精度。针对以往人脸位姿估计精度不高，限制了其应用的能力的问题，本文设计了一个多级像素级对应学习网络，通过自注意力机制鲁棒地学习输入图像中的2D像素与世界坐标系中3D人脸的3D点之间的对应关系，以进行更精确的六自由度人脸位姿估计。本文在多个数据集上进行了对比实验以及分析实验，实验结果表明了本文方法重建得到的三维人脸很好的解决了失真问题，同时估计得到的位姿也很精确，对于虚拟现实等应用有很大帮助。
英文摘要	请输入英 With the development of deep learning, technologies such as virtual wearables, virtual makeup, video editing, animation production, and fatigue recognition are continuously improving and evolving. Alongside these advancements, the demand for applications on smartphones and computers is also on the rise. Recently, there has been widespread attention in the fields of computer vision and computer graphics towards face reconstruction and six degrees of freedom (6-DoF) pose estimation. Currently, researchers both domestically and internationally have proposed numerous methods for reconstructing 3D face models from a single RGB image. Most of these reconstruction methods use orthographic projection to approximate real perspective projection, neglecting the characteristic of faces appearing larger when closer to the camera in the camera coordinate system. These methods perform well when the face size is small relative to the distance to the camera. However, with the popularity of technologies like selfies, virtual try-ons for eyeglasses, and makeup applications, facial capture scenes have become more complex. Faces reconstructed using orthographic projection exhibit noticeable distortions, making the estimation of perspective projection crucial. Estimating facial pose often appears as a subtask in these methods. However, the pose obtained from these methods typically represents only the orientation of the face, lacking information about the face's offset from the camera. This limitation further restricts the versatility and convenience of their application. The main contributions and achievements of this article can be summarized as follows: In order to address the significant distortion in face reconstruction based on orthogonal transformations and the previous problem of only focusing on the rotation angle in face pose estimation, this paper introduces a reconstruction method for the six degrees of freedom (6DoF) face pose. It explores the use of a single RGB image for perspective projection-based 3D face reconstruction. By employing a single network, it simultaneously estimates the 3D face in the world coordinate system and predicts the 6DoF face pose, thus improving the accuracy of face reconstruction while optimizing the precision of pose estimation. To address the problem of low accuracy in previous face pose estimation methods, which limits their applicability, this paper proposes a multi-level pixel-wise correspondence learning network. By incorporating a self-attention mechanism, the network robustly learns the correspondence between 2D pixels in the input image and the 3D points of the face in the world coordinate system. This enables more accurate estimation of the six degrees of freedom face pose. This article conducts comparative experiments and analyzes the results on multiple datasets. The experimental results demonstrate that the proposed method in this paper effectively addresses the distortion issues in 3D face reconstruction. Additionally, the estimated pose is highly accurate. These findings reveal that this approach is highly beneficial for applications such as virtual reality. 文摘要
关键词	人脸重建，姿态估计，深度学习，神经网络，六自由度
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/58542
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	徐淼. 基于透视投影的三维人脸重建及姿态估计[D],2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
毕业论文_签字版(1).pdf（12191KB）	学位论文		限制开放	CC BY-NC-SA