单视角三维人体姿态估计研究

CASIA OpenIR > 模式识别实验室

	单视角三维人体姿态估计研究
	陈泽睿
	2021-05
页数	86
学位类型	硕士
中文摘要	随着智能视频分析和人体行为理解技术的发展，三维人体姿态估计任务在计算机视觉领域受到了广泛的关注。作为一项新兴的研究内容，三维人体姿态分析有多种现实应用需求，例如体感游戏中的人物行为控制，以及影视作品中高难度的武打效果等。由于RGB相机是目前最普及的传感器，因此如何从图像中恢复三维人体姿态对于分析人在真实世界中的行为具有重要的意义。相比基于多视角图像的三维人体姿态估计方法，从单视角图像中恢复三维人体姿态具有运行速度快、部署方便、场景适应性强等优势。虽然相关研究已经取得了一些进展，但如何更便捷且鲁棒地恢复人体在三维空间中的姿态仍然面临着诸多的挑战。首先，人体的同一个三维姿态往往对应着多个二维投影，这使得从单视角图像中恢复三维人体姿态变得十分困难。另外，在非严格受控环境下人体的动作和姿态具有高度的多变性也是三维人体姿态估计模型设计中不可回避的难点问题。以卷积神经网络的相关进展为基础，本文针对上述挑战，对单视角三维人体姿态估计任务展开研究。本文取得的研究成果主要包含以下两项: 为了评估现有三维人体姿态估计模型的鲁棒性，本文探索了在对抗样本攻击下三维人体姿态估计模型的性能变化。为了解决这一问题，本文构建了四种类型的三维人体姿态估计模型，其中大多数现有的方法都能够被划分为其中的一类。同时，本文首先提出了针对三维人体姿态估计任务的对抗攻击方式和攻击过程中使用的目标函数。在此基础上，本文通过一系列实验全面地评价了各类型三维人体姿态估计模型的鲁棒性，并为设计更鲁棒的三维人体姿态估计模型提供了有价值的参考依据。之前的大多数单视角三维人体姿态估计方法均使用单一的神经网络结构去预测人体所有关节的三维坐标。考虑到人体不同部位的形状、运动模式、运动自由度等都是不同的，因此这种方法往往并不能取得最理想的结果。本文首先提出使用神经网络结构搜索的方法为人体的不同部位搜索合适的神经网络结构，并利用这些神经网络结构对人体不同的部位进行更精确的估计。为此，本文在可微分神经网络结构搜索的算法框架内引入了融合计算单元。该计算单元能够自动地将人体的所有部位划分为若干组并分别采用不同的神经网络结构生成人体不同部位的三维热力图，从而能够取得更精确的估计结果。本文所提出的方法在Human3.6M和MuPoTS-3D数据集上验证了有效性并取得了有竞争力的效果。
英文摘要	With the development of intelligent video analysis and human behavior understanding, 3D human pose estimation has received extensive attention in computer vision. As one of the rising topics in this field, 3D human pose estimation receives demands in various scenarios, such as the control of a character in somatosensory games and the complex martial arts effects in films. Since the RGB camera is the most popular sensor at present, how to recover 3D human poses from images is of great significance for analyzing the human behavior in the real world. Compared with estimating 3D human poses from multi-view images, monocular 3D human pose estimation has the advantages of fast running speed, convenient deployment, and strong scene adaptability. Although some progress has been made along this direction, how to make more convenient and robust 3D human pose estimations from monocular images still faces many challenges. First of all, the same 3D human pose may correspond to many 2D projections, making it more challenging to recover 3D human poses from monocular images. In addition, the flexibility of human motions in outdoor environments also brings difficulty in the design of 3D human pose estimators. This thesis focuses on recovering 3D human poses from monocular images based on the recent progress of convolutional neural networks. The main contributions in the thesis can be summarized as follows: In order to evaluate the robustness of current 3D human pose estimators, this thesis explores to attack 3D human pose estimators via adversarial attacks. To this end, this thesis first constructs four types of 3D human pose estimators, and most of the current methods can be generally classified into one of them. This thesis also proposes the attack method and objective function for this task. Finally, this thesis conducts extensive experiments to evaluate the robustness of different types of 3D human pose estimators and provides some valuable insights into how to design more robust models. Most previous monocular 3D human pose estimation methods often employ a single neural network architecture to estimate all joints of the human body. Considering that different body parts might have different shapes, movement patterns, and degrees of freedom, these methods often fail to achieve satisfactory results. Instead, this thesis proposes to search for suitable neural architectures for different body parts. To this end, it introduces the fusion cell to the framework of differentiable architecture search (DARTS). The fusion cell can automatically divide all parts of the human body into several groups and use different neural network architectures to generate 3D heat maps and make more robust predictions. This thesis validates the effectiveness of this method on Human3.6M dataset and MuPoTS-3D dataset. In comparison with previous methods, this method can achieve more competitive performance.
关键词	单视角三维人体姿态估计对抗攻击神经网络结构搜索
语种	中文
七大方向——子方向分类	图像视频处理与分析
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/44440
专题	模式识别实验室
通讯作者	陈泽睿
推荐引用方式 GB/T 7714	陈泽睿. 单视角三维人体姿态估计研究[D]. 中国科学院自动化研究所. 中国科学院大学,2021.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
单视角三维人体姿态估计研究.pdf（13774KB）	学位论文		开放获取	CC BY-NC-SA