Knowledge Commons of Institute of Automation,CAS
基于信息传递的人体姿态估计方法研究 | |
周鲁 | |
2021-05-29 | |
页数 | 138 |
学位类型 | 博士 |
中文摘要 | 随着成像和存储技术的发展,图像和视频资源正呈现爆炸式的增长。如何从海量数据中提取出有用的结构化信息,对于理解图像和视频至关重要。人是图像视频等数据的核心要素,也是视觉内容的主要目标和表达主体。在复杂的应用场景下实现对人的结构化分析有助于完成行为判别、场景理解等高层任务,因而受到了广泛关注。其中,人体姿态估计旨在给定图像的条件下估计人体关键点的位置,是理解人体语义和分析人体结构的有效手段之一,在行为识别、虚拟现实、智慧医疗、治安防控等多个领域有着广泛的应用。因此,人体姿态估计具有十分重要的学术价值和实用意义,也成了近几年计算机视觉领域的热门课题。近年来,基于深度学习的人体姿态估计方法取得了巨大的成功,有效地提升了人体姿态估计的性能。然而人体姿态估计远没有达到理想中的效果。首先,人体图像存在尺度变化问题。其次,人体是一个非刚体结构,不同的关键点具有不同的运动自由度,从而造成人体姿态的复杂多样。此外,混杂的背景、密集人群中出现的拥挤和遮挡对人体姿态估计也造成了巨大的挑战。因此,本文以深度神经网络为基础,通过不同层级的信息传递算法和合理的网络结构设计来解决人体姿态估计中出现的诸多难题,提升了人体姿态估计的效果。本文主要的工作和贡献有: • 基于双向信息传递和空间通道注意力的人体姿态估计。针对人体姿态估计网络无法充分利用语义和空间细节信息以及特征中存在大量冗余和噪声的缺陷,提出了一种基于双向信息传递和空间通道注意力的人体姿态估计方法。首先,通过引入多尺度双向信息传递机制来促进多个尺度特征间的信息传递,高低尺度特征间的信息交互丰富了各尺度特征的语义和细节信息,而多尺度特征的融合则进一步提升了网络的尺度鲁棒性。其次,针对特征冗余和噪声干扰,本方法引入了语义增强通道注意力机制和尖锐空间注意力机制,旨在不同维度上对特征噪声进行抑制,从而获得更干净的特征表示。在公开数据集的实验结果表明,本方法有效地提升了模型的精度,在多个数据集上取得了同期领先的性能表现。 • 基于空间变换网络的人体姿态估计。针对人体姿态估计网络出现的热度图假阳性预测问题,提出了一种基于空间变换网络的人体姿态估计方法。首先,引入了空间变换网络来促进不同关键点热度图间的信息传递。其次,为了增强空间变换网络的变换能力,引入了肢干引导机制来为信息传递过程提供显式的方向指引。同时利用对抗学习来增强人体肢干预测的质量,从而提供更精确的方向引导信息,提升空间变换网络的性能。此外,为了消除空变换现象,空间变换网络采用加权均方误差损失来削弱背景损失权重,同时引入了卷积随机游走抑制预测噪声。在公开数据集的实验结果表明,本方法有效地减少了热度图中的假阳性预测,相较于基准模型取得了显著的性能提升。 |
英文摘要 | With the development of imaging and storage technology, image data and video resources are showing explosive growth. How to extract useful structure information from massive data is very important for understanding images and videos. Human is not only the core element of image and video data, but also the main target and expression subject In recent years, the human pose estimation methods based on deep learning have achieved great success, which greatly improve the performance of human pose estimation. However, human pose estimation is far from ideal. Firstly, there exist scale change problems in natural images. Secondly, in real life, the human body is a non rigid body structure. Different keypoints have different degrees of freedom, resulting in the complexity of human postures. In addition, messy background, crowding and occlusion in dense crowd pose great challenges to human pose estimation. Hence, based on the deep learning framework, this dissertation exploits different levels of message passing algorithms and designs reasonable network structures to solve the mentioned challenges • A bidirectional message passing based spatial and channel-wise attention network is proposed to address the issue that human pose estimation network cannot make full use of semantic and spatial details and there are a lot of redundancy and noise in features. Firstly, message passing among multi-scale features is promoted by the multiscale bidirectional message passing mechanism. The information interaction between high and low scale features enriches the semantic and detailed information of each scale feature. Besides, fusion of multi-scale features further improves the scale robustness of the network. Secondly, aiming at feature redundancy and noise interference, this method introduces semantics-enhanced channel-wise attention mechanism and sharp spatial-wise attention mechanism to suppress feature noise in different dimensions and obtains a cleaner feature representation. Experimental results on public datasets show that the proposed method can effectively enhance the generalization ability of the model in the case of scale change, complex background interference, congestion, etc., and the performance is also significantly improved. • A progressive pose grammar based human pose estimation method is proposed to solve the problem that convolution neural network can’t learn human body structure information explicitly. Firstly, pose grammar module is built to encode relationships among human keypoints, thus promoting the message passing among different human keypoints features. Secondly, 3D convolution is introduced to improve the effect of • A spatial transformation network based human pose estimation method is proposed to address the problem of false positive predictions in heatmaps. Firstly, a spatial transformation network is introduced to promote the message passing among different human keypoints heatmaps. Secondly, to enhance the transformation ability of transformation network, limb guidance mechanism is introduced to provide explicit direction guidance for network. Meanwhile, adversarial learning mechanism is introduced to improve the quality of limb predictions, thus providing more accurate direction guidance information and improving the performance of spatial transformation. In addition, spatial transformation network takes the weighted mean square error loss to weaken the weight of background loss and addresses the empty transformation problem. Convolutional random walk is introduced as well to suppress the prediction noises. False positive predictions in rectified heatmaps are greatly reduced and network performance is largely boosted. Experimental results on public datasets show that the proposed method can effectively reduce the false positive predictions in heatmaps, and has a significant performance improvement over the baseline model. A siamese network for occlusion scenes is proposed to address self-occlusion and extern occlusion problems. Firstly, The network is occlusion-aware with the predictions of occlusions. Afterwards, features are erased and reconstructed with the erasing and reconstruction module, and thus occluded features are refined. Secondly, mimicking learning mechanism based on siamese network is introduced to enable the occluded branch feature to approach the unoccluded branch feature , thus increasing the feature robustness when facing the occlusions. In addition, the loss of imitation learning based on the optimal transport divergence promotes the information interaction between the two branch features of the twin network, and little increase in the amount of parameters and computation also saves the computations of the network. Experimental results on public datasets show that the improvement over occluded human keypoints of the proposed method reaches up to 1.72%, and achieves the leading performance among algorithms of the same period. |
关键词 | 人体姿态估计 信息传递 姿态语法 空间变换 遮挡感知 |
语种 | 中文 |
七大方向——子方向分类 | 图像视频处理与分析 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/44916 |
专题 | 紫东太初大模型研究中心_图像与视频分析 |
推荐引用方式 GB/T 7714 | 周鲁. 基于信息传递的人体姿态估计方法研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2021. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
基于信息传递的人体姿态估计方法研究.pd(28429KB) | 学位论文 | 开放获取 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[周鲁]的文章 |
百度学术 |
百度学术中相似的文章 |
[周鲁]的文章 |
必应学术 |
必应学术中相似的文章 |
[周鲁]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论