CASIA OpenIR  > 毕业生  > 博士学位论文
基于参数化网格约束的三维人体和手物交互重建
胡俊星
2024-05-14
Pages134
Subtype博士
Abstract

作为计算机视觉和计算机图形学中的研究热点,三维人体与手物交互重建旨在通过提取二维图像中的有效信息,重建虚拟的三维人体、人手和物体。随着算法技术的发展和硬件算力的提升,元宇宙、数字人等概念开始进入人们的日常生活中。三维人体与手物交互重建作为核心技术,在教育、体育、医疗、交通、娱乐等众多领域拥有广阔的应用前景。由于人体通常会与周围环境进行交互,而手部是人体与外界物体接触最频繁的部位,手物交互重建能够使计算机理解人类的行为,感知交互物体的客观存在,在虚拟现实、机器人控制和人机交互等领域有巨大的发展潜能。

 

由于手机、相机等设备在日常生活中的普及,本文围绕基于单目RGB图像的三维人体和手物交互重建展开。然而,由于缺乏深度信息,直接从单张图像中恢复三维结构是有歧义的。此外,人体与手部的姿态复杂多样,手物交互之间也存在相互遮挡,给现有算法带来了挑战。为了提升重建性能,本文按照从独立到交互的思路,围绕基于参数化网格约束的三维人体和手物交互重建展开研究。首先,借助人体参数化模型网格引入结构先验,构建个性化图进行三维人体的姿态和形状估计。其次,在手物交互重建中,由于现有的方法没有充分利用接触信息导致重建结果不够理想,对接触进行建模并估计手部参数化网格与交互物体的接触状态,以此作为手物交互的重要信息。最后,对估计的接触信息进行结构化编码,从而辅助模板未知的手持物体实现逼真合理的隐式重建。

 

本文的研究工作和创新点归纳如下:

 

1、基于个性化图约束的三维人体网格重建。

在基于单目图像的三维人体网格重建任务中,针对恢复的人体网格与图像无法对齐的问题,本文提出了一种基于参数化模型构建个性化图的三维人体网格重建方法。现有基于卷积神经网络的方法在训练和推理过程中将提取的特征重组,丢失了人体的空间结构信息;基于图神经网络的方法有效地利用了人体模型的类图性质,但是通常利用相同的人体模板模型为不同的样本构建图,忽略了个体的几何感知。因此,本文将两类方法相结合,提出了一种名为个性化图生成的端到端方法,利用中间预测的粗糙人体网格构建个体几何感知图,引入结构约束信息。首先,本方法使用初始的卷积模块为每个样本回归粗糙的人体参数化网格。然后,借助预测的三维网格从二维的特征图中提取局部特征,并将这些几何感知的特征与粗糙的人体模型相结合以构建图结构。此外,根据初始网格自适应地生成面向人体的图邻接矩阵,从而考虑人体网格顶点之间的全身关系,进一步增强对身体几何形状的感知。最后,本方法引入图注意力机制,使模型更多地关注人体边缘和四肢等容易重建错误的部位,从而得到更加准确的重建结果。在四个人体重建数据集上的定量实验及更多的定性比较表明,所提出的方法在室内和户外等现实场景中的身体遮挡、光照变换、视觉歧义等复杂情况下较先前方法有显著的提升。

 

2、基于多级网格图约束的三维手物接触状态估计。

在基于单目图像的三维手物接触状态估计任务中,针对接触预测依赖物体三维几何模板的问题,本文基于手部参数化模型网格对接触进行显式建模并设计了物体几何模板未知的手物接触状态估计方法。现有的方法通常根据手部与物体网格的距离判断两者的接触状态,从而无法处理几何模板未知的陌生物体,应用场景受到极大的限制。然而,三维接触信息同时存在于手部和物体之间,在交互物体模板未知的情况下,手部参数化网格表面同样包含接触信息。因此,本文提出利用单目图像包含的视觉信息以及手部网格提供的结构约束,直接预测每个手部顶点与交互物体发生接触的概率。本文设计了多级网格图,基于手部参数化网格分别构造了区域级和顶点级的网格图,从而提供多尺度的结构约束。为了预测准确的接触概率,本文提出了一种从粗到细的学习框架,利用多级图变换器联合学习区域和顶点级的接触状态。实验结果表明,所提出的方法在多个手物交互数据集中取得了优异的结果,并且对来自手部或物体的遮挡非常鲁棒,为接触感知的手物交互重建打下了坚实的基础。

 

3、显式接触约束和隐式表征结合的三维手物交互重建。

在基于单目图像的三维手物交互重建任务中,针对手物交互处的重建结果不理想的问题,本文提出了一种基于手部参数化网格构建的结构化接触编码,用以辅助手持物体的隐式重建。现有的方法通常使用隐式函数重建几何模板未知的手持物体,由于没有利用接触信息,或者仅使用了吸引和排斥等接触损失函数约束手物交互,导致交互重建的效果较差。因此,本文将显式的手物接触预测与隐式的物体表征相结合,以此促进手物交互的重建。具体的,首先使用现有的方法进行手部网格重建,将估计的接触信息锚定到相应的手部网格顶点上作为结构化接触编码。然后,由于隐式函数在整个三维空间中是连续的,采用稀疏卷积将离散的接触状态从手部参数化网格表面扩散到附近物体所处的三维空间,并利用三线性插值实现在任意位置处查询接触特征以构建物体的隐式神经表示,有效地指导物体的隐式重建。接下来,利用估计的接触概率构造接触感知的交互损失函数,实现了更加灵活准确的接触约束。实验结果表明,所提出的方法在取得优异指标的同时,可以大幅改善手物接触部分的重建,得到视觉上更加合理的手物交互重建结果。

Other Abstract

As one of the hot topics in computer vision and computer graphics, 3D human and hand-object reconstruction aims to reconstruct virtual 3D human bodies, hands, and objects by extracting effective information from 2D images. With the development of algorithms and improvements in hardware performance, the concepts and related applications of the metaverse and digital humans gradually appear in people's daily lives. As their core technology, 3D human reconstruction has broad application prospects in many fields such as education, sports, medical care, transportation, and entertainment. Since the human body is not alone but interacts with the surrounding environment, and hands are parts of the human body that have the most frequent contact with objects, reconstructing hand-object interaction can enable the computer to understand human behavior and perceive the existence of hand-held objects, which could be applied in virtual reality, robot control, and human-computer interaction. At the same time, these tasks also put forward higher requirements for the accuracy and robustness of algorithms.

 

Due to the popularity of monocular devices such as mobile phones and cameras, this thesis focuses on the reconstruction of 3D human and hand-object interaction based on monocular RGB images. However, directly recovering 3D structures from a single image is ambiguous due to the lack of depth information. In addition, the postures of the human body and hands are complex and diverse, and there are also mutual occlusions between hand and object interactions, which are challenging for existing algorithms. To improve performance, this paper focuses on the reconstruction of 3D human and hand-object interaction based on parametric mesh constraints, following a research route from independence to interaction. First, the human parametric model mesh is exploited as a structural prior, and a personalized graph is generated for 3D human pose and shape estimation. Next, since the existing methods do not fully utilize the contact information in hand-object reconstruction, their results are not ideal. This paper models and estimates the contact state between hands and interactive objects as an important hint for hand-object reconstruction. Furthermore, the structured contact codes are built based on estimated contact information. It facilitates realistic and reasonable implicit reconstruction of hand-held objects without 3D templates.

 

In summary, the main contributions of this thesis are listed as follows:

 

1. 3D human mesh reconstruction with personalized graph constraints.

In monocular 3D human mesh reconstruction, there is usually a significant misalignment between the reconstructed body mesh and the image. To address this issue, the Personalized Graph Generation (PGG) is proposed based on the human parametric model for 3D human model reconstruction. Existing methods based on convolutional neural networks reshape the extracted features, which cannot maintain the spatial structure information during network training. Although several methods based on graph neural networks effectively utilize the graph-like nature of the human mesh model, they build graphs for different instances based on the same template human mesh, neglecting the geometric perception of individual properties. Therefore, this work combines the above two types of methods and proposes PGG. First, a convolutional module initially regresses a coarse parametric mesh tailored for each sample. Guided by the 3D structure of this personalized mesh, PGG extracts the local features from the 2D feature map. Then, these geometry-aware features are integrated with the specific coarse human model parameters as graph vertex features. Furthermore, a body-oriented adjacency matrix is adaptively generated according to the coarse mesh. It considers individual full-body relations between vertices, enhancing the perception of body geometry. Finally, a graph attentional module is utilized to make PGG pay more attention to limbs and body contours that are prone to misalignment between the reconstructed mesh and the image. Quantitative experiments across four benchmarks and qualitative comparisons on more datasets show that the proposed method is significantly better than previous methods in complex situations such as occlusion, illumination, and ambiguity under real indoor and outdoor scenes.

 

2. 3D hand-object contact estimation with multi-level mesh-based graph constraints.

In monocular 3D hand-object contact estimation, previous methods rely on 3D templates of objects to determine contact states. To solve this problem, this work explicitly models the contact based on the parametric hand model mesh and designs a contact estimation method without 3D object templates. Previous methods judge the hand-object contact status based on the distance between the two meshes. So they cannot handle the objects without geometric templates, and their application scenarios are limited. It can be observed that 3D contact exists on both hands and objects, when the object template is unknown, the parametric hand mesh surface could still provide contact information independently. Therefore, this work proposes to directly predict the contact probability between each hand vertex and the hand-held object, by using visual features extracted from the image and structural constraints provided by the hand mesh model. This work designs a multi-level mesh-based graph. It constructs part-level and vertex-level graphs based on the parametric hand mesh to provide multi-scale structural constraints. To estimate precise contact probabilities, a coarse-to-fine framework is proposed by cascading the part-level and vertex-level graph-based transformers for joint learning. Experimental results show that the proposed method achieves excellent results in multiple hand-object interaction datasets and is very robust to occlusion from hands or objects, laying a solid foundation for contact-aware hand-object interaction reconstruction.

 

3. 3D hand-object reconstruction by combining explicit contact constraints and implicit representation.

In monocular 3D hand-object reconstruction, the reconstructed mesh of hands and objects, especially where they interact is not ideal. To address this issue, the structured contact codes based on the hand model are proposed to facilitate the implicit reconstruction of hand-held objects. Though recent works have employed implicit reconstruction for objects without 3D geometric templates, they ignore formulating contacts or only model contacts as an additional loss function, which results in producing less realistic object meshes. Therefore, this work proposes to combine explicit hand contact prediction and implicit object reconstruction to facilitate the mesh recovery of hand-object interaction. First, the hand mesh is reconstructed by using a current method. The structured contact codes are generated by anchoring the estimated contact probabilities to the hand mesh surface. Then, since the implicit function has continuous values, the sparse convolution is exploited to diffuse discrete contact states from the parametric hand mesh surface to nearby 3D space. By using trilinear interpolation, the contact features could be queried anywhere to build implicit neural representations, effectively guiding the implicit reconstruction of objects. Furthermore, a contact-aware interaction loss is constructed by using estimated contact probabilities to achieve more flexible and accurate contact constraints. Experimental results show that the proposed method could advance the existing accuracy and improve object reconstruction, especially for parts that are in contact with hands, resulting in more visually reasonable meshes.

Keyword单目图像 参数化网格约束 人体网格重建 手物接触估计 手物交互重建
Subject Area计算机图形学 ; 计算机图象处理
MOST Discipline Catalogue工学::计算机科学与技术(可授工学、理学学位)
Language中文
Sub direction classification三维视觉
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/56679
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
胡俊星. 基于参数化网格约束的三维人体和手物交互重建[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
胡俊星博士学位论文-最终版-明版-sig(28026KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[胡俊星]'s Articles
Baidu academic
Similar articles in Baidu academic
[胡俊星]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[胡俊星]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.