基于动态注意力机制的姿态估计方法研究 | ||||
邹嘉钰 | ||||
2023-05 | ||||
页数 | 90 | |||
学位类型 | 硕士 | |||
中文摘要 |
| |||
英文摘要 | Pose estimation is a basic and challenging task in the field of computer vision, and has in-depth and extensive applications in the fields of behavior recognition, pedestrian detection, automatic driving, human weight recognition, and human-computer interaction. In real life, different key points of the same individual have the problem of self-occlusion, and there is mutual occlusion between the key points of different individuals, which brings difficulties to the detection and positioning of key points. With the increasing diversity and complexity of application scenarios, people's requirements for the accuracy of pose estimation algorithms are becoming more and more stringent, so it is particularly important to explore higher precision pose estimation algorithms. Aiming at the pose estimation method based on dynamic attention network, combined with the existing research foundation, the theoretical method innovation in key point semantic coding, key point spatial interaction, key point feature fusion and other aspects is carried out, and the main work and contributions of this paper are as follows. (1) Aiming at the problem that local detail semantics and global abstract semantics are difficult to fully complement, a semantic coding representation method based on dynamic attention is proposed, which provides high-quality comprehensive semantic feature representation for subsequent decoding networks. Existing semantic networks based on convolutional neural networks are difficult to model long-distance dependencies, while transformer-based semantic networks rely too much on large-scale annotated datasets and have a heavy computational burden. In this paper, a semantic coding method based on dynamic attention network is proposed, and three different dynamic semantic coding structures are designed to couple the local features and global features of each stage, introduce mutual learning loss in the parallel interaction structure, and provide better prior features for subsequent decoding head networks. The effectiveness of this method is verified on multiple datasets, and the accuracy of pose estimation is significantly improved. (2) Aiming at the problem of insufficient spatial information interaction of key points, a dual-branch spatial interaction mechanism based on dynamic attention is proposed, which effectively promotes the spatial information interaction between key points. The existing work lacks the interaction mechanism based on convolutional neural network and Transformer-based network structure, and it is difficult to effectively combine the advantages of the two. In this paper, a two-branch spatial interaction method based on dynamic attention network is proposed, which can inherit the advantages of translational invariance and local correlation in feature extraction of convolutional neural networks, and also inherit the advantages of long-distance modeling in feature extraction of Transformer, which is conducive to improving spatial interaction ability. Comparative experiments and visual analysis on multiple public datasets show that this method can effectively improve the attitude estimation performance of crowded scenes and occluded areas. (3) Aiming at the problem of insufficient multi-scale feature fusion and insufficient semantic association, a feature fusion method based on dynamic attention is proposed, which promotes the semantic association between feature fusion of different granularity and multi-category key points. The existing methods either ignore the problem of feature fusion, or only adopt the method of multi-scale feature pre-fusion, which brings a large computational burden and is difficult to fully explore the correlation between key points in different parts. In this paper, a pre-feature fusion and post-feature fusion method based on dynamic attention network is proposed. The pre-feature fusion module fuses the underlying features containing rich detailed information with the high-level features containing rich semantic information, and the post-feature fusion module adaptively focuses on important features and improves the detection performance of different key points through attention score. Compared with the baseline model, the feature fusion method proposed in this paper has good performance improvement, which verifies the effectiveness of the proposed feature fusion module. In summary, through the study of the structural framework of the pose estimation algorithm, this paper proposes a network framework that can efficiently classify and locate key points. Through dynamic attention, semantic coding and spatial interaction are optimized, and the features of key points in different parts are fused before and after, so as to achieve the goal of accurate key point classification and localization in various complex scenarios such as self-occlusion and mutual occlusion. Experimental verification on multiple public datasets, the proposed method has good performance, which has certain reference significance for the research in the field of pose estimation. | |||
关键词 | 姿态估计 动态注意力机制 空间交互 特征融合 | |||
语种 | 中文 | |||
七大方向——子方向分类 | 图像视频处理与分析 | |||
国重实验室规划方向分类 | 视觉信息处理 | |||
是否有论文关联数据集需要存交 | 否 | |||
文献类型 | 学位论文 | |||
条目标识符 | http://ir.ia.ac.cn/handle/173211/51671 | |||
专题 | 毕业生_硕士学位论文 中科院工业视觉智能装备工程实验室_精密感知与控制 | |||
推荐引用方式 GB/T 7714 | 邹嘉钰. 基于动态注意力机制的姿态估计方法研究[D],2023. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
基于动态注意力机制的姿态估计方法研究.p(14145KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[邹嘉钰]的文章 |
百度学术 |
百度学术中相似的文章 |
[邹嘉钰]的文章 |
必应学术 |
必应学术中相似的文章 |
[邹嘉钰]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论