实时人体姿态估计与形状重建及其在虚拟试衣中的应用

	实时人体姿态估计与形状重建及其在虚拟试衣中的应用
	蒋利国
	2021-05-25
页数	122
学位类型	博士
中文摘要	三维人体与服装建模在影视动画、三维游戏、服装设计以及虚拟试衣等领域都有着广泛的应用。目前三维人体与服装模型的获取方式主要以商业软件的交互式建模或者三维扫描为主，但是由于其复杂的交互、较高的成本以及特殊的场景限制，很难普及使用。本文旨在研究如何利用低成本RGB相机，准确、快速而且自动地重建出三维人体模型，包括姿态与形状，并探索其在虚拟试衣中的相关应用。首先，针对三维人体姿态估计，本文研究了从人体二维姿态到三维姿态的映射歧义问题，并提出加入辅助特征消除歧义性的方法。其次，本文研究了从单张自然人像图片重建三维人体的方法，既包括三维几何形状模型又包括人体姿态。最后，在获得三维人体的基础上，本文研究如何将其应用于虚拟试衣，完成服装在不同人体之间的保尺寸试穿，并提出一套试穿框架同时对试穿结果做出评估。具体来说，本文的主要内容和贡献包括： 1.提出了一种基于部件深度序数的三维人体姿态估计方法针对从二维人体姿态提升到三维姿态的歧义问题，本文引入了一个人体部件深度序数作为额外的特征输入消除该歧义性。该序数信息表示为相邻父子关节点的深度顺序关系。本文从图像中预测该部件深度序数分类，并以类别图的形式进行预测，有效地将网络预测与图像外观联系在一起。实验表明，该形式有助于部件序数类别预测。另外，本文提出利用时域卷积网络来提升三维姿态，进一步减小二维姿态映射到三维姿态歧义性，同时能够减小单张图像估计的分类误差与姿态误差对三维人体姿态估计带来的负面影响。实验表明，在二维人体姿态到三维姿态的提升中加入部件深度序数能够有效缓解歧义性，同时在主流基准数据集上取得了当时最好的性能。 2.提出一种基于多任务网络的人体形状与姿态重建方法本文提出利用自然场景人像图片的实时重建人体形状与姿态。首先，利用一个轻量级多任务学习网络HMT-Net，从单张图像中快速预测5种人体二维/三维感知信息。HMT-Net通过融合多种不同任务的特征来引导网络学习任务之间的相关性，得到比单一任务更准确的预测结果。然后，本文将HMT-Net预测的人体感知信息作为观察值输入，同时引入了一个人体参数化模型，通过最小化合成人体与观察值之间的配准误差对姿态与体型参数进行优化求解。除了人体骨骼的稀疏配准误差，本文还引入了针对稠密人体网格的重投影配准误差，进一步约束人体的体型与姿态。为了训练HMT-Net，本文搭建了一个简单的多视图绿幕采集环境，并采集超过300人的包含5个人体任务标签的数据集。实验表明，本文方法可以从单张自然人像图中重建人体形状，并实时（>20Hz）重建三体姿态。同时，本文方法的重建结果优于当时最好的实时方法，甚至要优于大部分非实时的方法。 3.提出一套面向任意体型姿态人体的虚拟服装保尺寸试穿系统虚拟试衣是三维人体建模的重要应用之一。如何高效地将服装模型在不同体型、不同姿态的人体上进行试穿，并保证真实感的服装模拟效果，一直是服装设计以及虚拟试衣中的研究热点。本文针对这一热点问题，设计了一套将服装从参考人体向任意姿态、任意体型人体上进行的保服装尺寸的迁移试穿系统。利用基于骨骼对齐驱动的网格对齐算法，将目标人体对齐至参考人体。该骨骼对齐包含骨骼姿态对齐与骨骼长度对齐。本文将骨骼姿态分解为swing和twist两部分，姿态对齐即是对这两部分进行分别对齐。将对齐后的目标人体“试穿”进服装后，再通过布料物理模拟解决对齐目标人体与服装之间的由于非完美“试穿”造成的穿透，最后将对齐后的目标人体逆变换至初始状态，而服装通过布料模拟无碰撞的试穿在目标人体上。实验表明，该方法可以有效进行服装迁移并辅助评估试穿舒适度，在姿态难度不大的人体上可以自动完成试穿。总的来说，本文工作从实用角度出发对基于RGB相机的三维人体姿态与形状重建进行了进一步的研究，并对其在虚拟试衣的应用进行了积极的探索。
英文摘要	Three-dimensional (3D) human body and virtual garment modeling have been widely used in movies, video games, apparel design and virtual try-on systems. Nowadays the creation of 3D human body and virtual garments largely rely on interactive modeling with the help of commercial software and 3D scanners. It is difficult to be used by end-users due to the interaction complexity, high cost and scene restrictions. The goal of this thesis is to reconstruct 3D human body by a monocular RGB camera, including human posture and geometry shape, and explore its application in virtual try-on systems. We first focus on the reconstruction of 3D human posture, and study the ambiguity problem when lifting a 2D pose into 3D. We introduce an auxiliary cue as input to mitigate the ambiguity. Secondly, we study a method to reconstruct a 3D full-body geometric model from a single image. Finally, with reconstructed human bodies, we investigate how to transfer and fit fixed-grade garments to bodies of various shapes and postures by proposing an effective virtual try-on system. Specifically, the main contents and contributions of this paper include: 1. A method based on human part ordinal depth for 3D pose estimation. In order to solve the uncertainties in lifting a 2D pose to 3D, we introduce the concept of ordinal depth category, which depicts three depth ordering relationships for linked joints. Being used as an additional input, this ordinal depth category is encoded as a map, called category map, which provides better association between prediction with image appearance. Compared with the other formulation, our category map can lead to a higher classification accuracy. Secondly, taking predicted 2D human pose and ordinal depth category as input, we put forward a temporal convolution network to regress 3D human pose, which not only alleviates the 2D-to-3D uncertainty, but also reduce the negative effect of prediction errors further. Experimental results show that adding ordinal depth category can effectively alleviate the ambiguity, and our method can outperform promising results on several benchmarks. 2. A realtime system for 3D human geometric body reconstruction based on a multi-task network. We propose a realtime system for reconstructing 3D human, including body geometry and 3D posture, from a single image. Firstly, a lightweight multi-task learning network, called HMT-Net, is proposed to predict five 2D/3D human cues from a single image. HMT-Net makes each individual prediction more accurate than single task learning. It is realized by fusing different task predictions, which forces the network to learn the correlation between different tasks. The five human cues further guide the generation of a full-body model by minimizing the registration error between the synthetic body and human cues. In addition to the sparse registration constraint for human skeleton, we introduce a dense re-projection registration constraint for human body, which further constrains human body and postures. Besides, in order to train the HMT-Net, we set up a multi-view capture environment with green curtains as background, and collect data from over 300 actors with ground truth labels for five tasks. Our method can reconstruct 3D human body from wild video in realtime (>20Hz), and surpasses existing realtime and most offline works. 3. A pipeline for virtual fitting garments to bodies of arbitrary postures and shapes. 3D human reconstruction is one of the most important components in a virtual try-on system. How to efficiently transfer and fit garments to bodies of various shapes and postures, with realistic garment simulation, is a very challenging topic. We propose a framework for transferring and fitting fixed-sized garments onto bodies of various dimensions and postures. At the first step, the target body, with embedded skeleton, is deformed towards a reference body, by aligning the skeleton with that of the reference. Both posture and bone lengths are aligned. For skeleton posture alignment, we decouple the orientation of each joint into two components: swing and twist, and align them separately. After that, the deformed target body 'fits' into the garments with as few penetrations as possible. The garment is then treated as elastic and is physically simulated, with all existing body/garment penetrations being solved. The deformed target body restores its original shape gradually, which produces a series of intermediate shapes, while the garments are maintained to be collision-free with each intermediate shape while being simulated. Experimental results show that the proposed framework is effective for garment transfer and fitness evaluation, and it can make a garment transfer automatically for human body with easy postures. In general, this paper further studies the 3D body posture and geometry shape reconstruction by a monocular RGB camera from a practical aspect, and makes a active exploration for its application in the field of virtual try-on.
关键词	三维人体姿态与形状重建多任务学习部件深度序数骨骼对齐驱动虚拟试穿
语种	中文
七大方向——子方向分类	三维视觉
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44897
专题	多模态人工智能系统全国重点实验室_机器人视觉
推荐引用方式 GB/T 7714	蒋利国. 实时人体姿态估计与形状重建及其在虚拟试衣中的应用[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
论文提交.pdf（58250KB）	学位论文		开放获取	CC BY-NC-SA