CASIA OpenIR  > 毕业生  > 博士学位论文
基于深度隐函数表示的稀疏视角人体三维重建方法研究
刘彭鹏
2023-11
Pages140
Subtype博士
Abstract

人体三维重建是计算机视觉以及图形学领域的热点研究问题之一。传统的人体三维重建常依赖于稠密的输入视角和昂贵的采集设备,应用场景受限。

而基于简单设备的高精度人体三维重建在通讯、医疗、教育、文化等领域具有广泛的应用前景,其挑战也更是巨大。为此,本文旨在基于简单设备RGB或RGB-D传感器采集的稀疏视角图像,实现高效、高精度的人体三维重建。

 

为了描述形体、衣服等复杂多变的三维人体,基于深度隐函数的人体三维表示,因其不受人体拓扑结构和输出分辨率的限制,备受关注。但由于稀疏视角的输入图像能被观测的信息十分有限,常存在大量不可见或遮挡区域,现有主流的基于深度隐函数表示的方法仍存在诸多限制,主要表现为: 1)基于有限的高清三维训练集和有限的输入观测信息,由于缺乏足够的先验,尤其是对于不可见或遮挡区域,此类方法常存在重建不完整、畸形、或缺乏细节等问题。同时,此类方法常假设于一种理想的训练环境,如正交、水平投影等,由于与实际数据存在较大的差异,在实际数据上常常泛化性较差。2)目前多视角人体三维重建方法和系统对不准确的相机参数极不鲁棒,过分依赖于准确的相机外参,以至于微小的相机扰动,都可能导致整个性能的崩塌。3)现有结合其它先验知识的方法,或依托复杂的网络架构和中间描述,很难实现精度和性能的平衡;或过度依赖于参数化模型特征(如SMPL特征),又引入了新的重建伪影。

 

针对上述目前方法存在的重建不完整、缺乏细节、泛化性差、不鲁棒、效率低等问题,本文基于深度隐函数的人体三维表示,研究关注点从稀疏多视角RGB-D的图像输入到单视角RGB的图像输入,层层递进,在输入逐渐“简单化”的场景下探索上述问题的不同解决方案。%实现高效、高精度的人体三维重建。

本文的主要工作和创新点归纳如下:

 

1.提出基于几何增强隐函数表示的人体三维重建算法。面向稀疏视角的RGB-D图像,实现了一种高效、高精度的人体三维重建。针对重建不完整、缺乏细节的问题,提出了相对深度特征和几何增强模块,在有效保留深度图像上局部几何细节的同时,增强网络对不可见区域的全局推断能力;同时提出了深度感知注意力机制,也更好地融合多视角特征。针对效率低的问题,利用深度先验,压缩模型并减少不必要的查询点,实现性能和效率的平衡。针对泛化性差的问题,合理利用先验,减少对全局特征的依赖,同时在训练时模拟实际噪声。实验证明该算法在由Kinect采集的实际图像上具有良好的表现,对比同类型算法,在提升精度的同时,运行效率也大大提升。

 

2.提出基于深度隐函数表示的相机相对姿态估计算法。针对上述人体三维重建算法对不准确相机参数不鲁棒的问题,提出了循环优化模块,对相对姿态和人体三维重建之间的关联性进行建模,利用可微渲染技术,构建一个循环迭代的优化网络,在优化相机参数的同时,提升人体三维重建的质量。同时,针对极端输入场景无法准确估计相机相对姿态的问题,提出了初始化估计模块,利用人体结构先验,对不可见区域进行补全,弥补输入图像对之间视觉上无直接关联的问题。实验证明了该算法可实现极端输入条件下的多相机系统的外参自标定。同时,实验表明该算法能扩展上述基于稀疏视角RGB-D的人体三维重建工作,实现不准确或缺乏相机外参输入下的高精度人体三维重建,从而进一步降低了输入的限制条件。

 

3.提出基于全先验隐函数表示的人体三维重建算法。面向单视角RGB图像,实现高精度且逼真的人体三维重建。重点关注极简输入下存在较多不可见或遮挡区域的问题,探索了一种合理结合SMPL的方法,提出了全先验隐式特征包括基于骨架的结构先验、相对深度先验和基于法向图的特征先验。针对遮挡导致的重建不完整或畸形等问题,提出了基于骨架的结构先验,采用了基于骨架的采样和基于骨架的特征嵌入两种策略,提升了网络对骨架的空间感知能力;提出了相对深度先验和基于法向图的特征先验,进一步加强了对不可见区域的推断能力。相比现有结合SMPL的方法,实验证明了提出了基于骨架的结构先验对不准确的SMPL更鲁棒,对穿宽松衣服的人体等场景,也能重建出较好的结果。该算法的有效性在基于单视角RGB和RGB-D的实验数据上均得到了验证。同时,实验表明将基于骨架的采样策略扩展到人脸重建任务中,也可实现面部细节的提升。

Other Abstract

3D human reconstruction is one of the hottest issues in the fields of computer vision and computer graphics. Traditional 3D human reconstruction often relies on dense-view inputs and expensive acquisition equipment, which leads to limited applications. High-quality 3D human reconstruction with simple equipment has wide applications in communications, medical care, education, culture and other fields, but the challenges are even greater. Therefore, this thesis aims to achieve an efficient and high-precision 3D human reconstruction with sparse-view images collected by consumer-level RGB or RGB-D sensors.

 

To represent a full 3D human with the various shapes and cloths, implicit function representation has attracted much attention because it is not limited by the human topology and output resolution. However, due to so limited observation from sparse-view image inputs with a large number of invisible or occluded areas, existing implicit function based methods still have many limitations, mainly as follows: 1) this type of method often lacks sufficient priors with limited 3D high-fidelity training dataset and limited 2D observation information from sparse-view inputs, which often leads to reconstruction artifacts like broken or disembodied parts, missing details, especially for invisible or occluded areas. Besides, this method is often based on an ideal training environment, such as orthogonal, horizontal projection, etc., which is greatly different from the actual data, leading to a poor generalization on actual data. 2) Existing multi-view 3D human reconstruction method is extremely not robust to inaccurate camera parameters. It relies so much on accurate camera extrinsic parameters that even small camera perturbations may lead to catastrophic damages. 3) Existing methods that combine other prior knowledge, either rely on complex network architecture and proxy descriptions, making it difficult to achieve a balance between accuracy and performance; or rely on parametric model features (such as SMPL features) so much, resulting in new reconstruction artifacts.

 

In response to the above-mentioned problems of broken or disembodied parts, missing details, poor generalization, non-robustness, and low efficiency, this thesis bases on implicit function, focusing ranges from sparse-view RGB-D image input to single-view RGB image input where the input is gradually "simplified". We explore different solutions to the above problems, progressing step by step. The main contributions of this thesis are summarized as follows:

 

1. This thesis proposes an algorithm for 3D human reconstruction based on geometry-enhanced implicit function with sparse-view RGB-D inputs. This algorithm aims to realize an efficient and high-precision 3D human reconstruction. In order to solve the problem of reconstruction artifacts like incomplete bodies or lack of details, the proposed relative depth feature and geometry-enhanced module can effectively retain the local geometric details on the depth image while reasoning about the global shape for the invisible areas; at the same time, the proposed depth-aware attention mechanism aggregates multi-view features better. To deal with the low efficiency problem, we use depth priors to compress the model and reduce unnecessary query points to achieve a balance between performance and efficiency. To address the problem of poor generalization, we rationally utilize priors to reduce dependence on global features while simulating actual noise during training. Experiments have proven that this algorithm has a good performance on real data collected by Kinect. Compared with similar algorithms, it not only improves accuracy, but also greatly improves operating efficiency.

 

2. This thesis proposes a relative pose estimation based on implicit function. In order to solve the problem that the above-mentioned 3D human reconstruction algorithm is not robust to inaccurate camera parameters, the loop optimization module is proposed to model the correlation between the relative posture and the 3D human reconstruction, we use differentiable render to build a loop iteration, iteratively alternating between human reconstruction and relative pose estimation. At the same time, in order to solve the problem that camera relative pose cannot be accurately estimated in extreme input scenes, the proposed initialization estimation module uses the human structure prior to complete the invisible area and make up for the problem that there is no direct visual correlation between the input image pairs. Experiments have proven that this algorithm can achieve extrinsic parameter self-calibration of multi-camera systems under extreme input conditions. At the same time, experiments show that this algorithm, combining with the above-mentioned human reconstruction work, can achieve a high-precision 3D human reconstruction under inaccurate or lack of camera extrinsic parameters, thereby further reducing input constraints.

 

3. This thesis proposes an algorithm for 3D human reconstruction based on holistic prior implicit function with single-view RGB images. This algorithm aims to achieve a high-precision and realistic 3D human reconstruction. Mainly focusing on the occlusion problem under minimalist input, we explored a method that reasonably combines SMPL. The proposed holistic priors include skeleton-based structural priors, relative depth priors and normal-based feature priors. In order to solve the incomplete or abnormal reconstruction artifacts caused by occlusion, the proposed skeleton-based structural prior adopts two strategies: skeleton-based sampling and skeleton-based feature embedding, which improves the network's spatial perception ability of the skeleton; the proposed relative depth priors and normal-based feature priors further enhance the ability to infer invisible areas. Compared with existing methods that combine SMPL, experiments have proven that the proposed skeleton-based structure prior is more robust to inaccurate SMPL and can also reconstruct better results for scenes such as human bodies wearing loose clothes. The effectiveness of this algorithm has been verified on single-view RGB and RGB-D datasets. At the same time, experiments show that the facial details can be improved by extending the skeleton-based sampling strategy to face reconstruction tasks.

Keyword人体三维重建,深度隐函数,稀疏视角,参数化模型,彩色-深度图像
Language中文
Sub direction classification三维视觉
planning direction of the national heavy laboratory其他
Paper associated data
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/54527
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
刘彭鹏. 基于深度隐函数表示的稀疏视角人体三维重建方法研究[D],2023.
Files in This Item:
File Name/Size DocType Version Access License
lpp博士论文最终稿.pdf(95063KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[刘彭鹏]'s Articles
Baidu academic
Similar articles in Baidu academic
[刘彭鹏]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[刘彭鹏]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.