Knowledge Commons of Institute of Automation,CAS
基于结构化建模的人体解析研究 | |
张小梅![]() | |
2021 | |
页数 | 138 |
学位类型 | 博士 |
中文摘要 | 随着互联网和多媒体技术的高速发展以及信息基础设施的不断完善,图像数据呈现出爆炸式的增长。如何利用图像数据为人类的生产生活服务成为一项日益重要的研究课题。而对图像数据中的人体进行解析,是数据智能应用中的一个基础而又必不可少的环节,并在诸如虚拟试衣、姿态识别、行人重识别和动作识别等领域具有广泛的应用价值和发展前景。 |
英文摘要 | The rapid development of the Internet and multimedia as well as the improvement of information infrastructure have led to the explosive growth of digital images. How to use these images to serve the human production and life has become an important research topic. Human parsing is a basic task in the application of data intelligence, and has broad prospects for development in fields such as virtual fitting, human pose estimation, person re-identification, action recognition and so on. Human parsing is a task of pixel-level classification of the human body in images, which tries to achieve the most fine-grained semantic expression of the human body. Algorithms of human parsing based on the fully convolutional network are of significance for the human parsing task. These algorithms obtain the high-level semantic information of the image by pre-training the image classification network, and use the up-sampling methods such as bilinear interpolation to recover the spatial details of the target, so as to obtain the classification result of the pixels of the human body parts. Although these algorithms have achieved good results, they still face some challenges. First, due to the interference of complex scenes and backgrounds similar to the human targets, these algorithms are difficult to extract the complete and accurate foreground, leading to the inaccurate semantic discrimination of human parts. To address this problem, this dissertation models the inherent structure of the human body by using deep models, which focuses on the foreground information of the human body and suppresses the interference of complex scenes and similar backgrounds. Second, due to the variance of size, occlusion, deformation and posture, the appearance of human parts vary greatly, and the identification of human parts may be confused. The key to solve this problem is how to improve the robustness of feature expression. Usually, the semantic discrimination of pixels or regions in images depends on the contextual information of the target. Therefore, it is very important to accurately capture context information for the recognition of pixels or regions. To this end, this dissertation designs reasonable full convolutional networks and strategies for structural modeling, obtaining features with rich context, and then improving the performance of human parsing. The main contributions of this dissertation are summarized as follows: 1. This dissertation proposes a tree hierarchical network to suppress the interference of cluttered scenes in natural scences, and improve the accuracy of a single classifier. The network employs the idea of the binary tree and partitions the human parts step by step. In each step, the network uses the part-aware fusion to generate accurate parsing results and passes the results to the next step. The network can automatically focus on the areas of interest and ignore the irrelevant information in the human parsing process, thereby reducing the interference of the background. To reduce accumulated errors, the network corrects the errors by merging the original features. Experimental results show that the proposed approach can effectively parse the human body in the cluttered scenes and improve the parsing results of a single classifier, achieving the competitive parsing results on several public objects parsing datasets during the same period of time. 2. This dissertation designs a blended grammar network to solve the problem of how to extract the whole foreground from similar or complex background effectively. The network exploits the inherent hierarchical structure of a human body and the relationship of different human parts by designing grammar rules of human parts. In each grammar rule, conspicuous parts, which are easily distinguished from the background, 3. This dissertation proposes a part-aware context network to solve the problem of how to generate adaptive contextual features for the various sizes and shapes of human parts. The proposed network uses the feature 4. This dissertation proposes a high- and low-level feature fusion network to solve the problem of the semantic-spatial gap between low-level and high-level features. The proposed network introduces the semantic information into low-level features and high-resolution details into high-level features, achieving the more effective fusion. The network also expands the receptive field and generates multi-scale contexts by fusing features of different levels. The experimental results demonstrate that this method can shrink the gap between different level features, and the accuracy of the human parsing method is far higher than that of other human methods in several human parsing datasets. The best contemporaneous performance of the same time single model is achieved. |
关键词 | 人体解析 结构化建模 多尺度上下文 全卷积神经网络 |
语种 | 中文 |
七大方向——子方向分类 | 图像视频处理与分析 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/44890 |
专题 | 紫东太初大模型研究中心_图像与视频分析 |
推荐引用方式 GB/T 7714 | 张小梅. 基于结构化建模的人体解析研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2021. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
Thesis-zxm-电子签字版.pdf(9315KB) | 学位论文 | 开放获取 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[张小梅]的文章 |
百度学术 |
百度学术中相似的文章 |
[张小梅]的文章 |
必应学术 |
必应学术中相似的文章 |
[张小梅]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论