基于图神经网络的视觉图表达方法 | |
卢毅![]() | |
2020-12-03 | |
页数 | 130 |
学位类型 | 博士 |
中文摘要 | 计算机视觉目的是解决智能体通过图像输入感知世界的问题,是人工智能的重要研究方向。深度学习特别是深度卷积神经网络方法是计算机视觉领域现在最常用的方法之一。但该方法仍然存在一些局限:方法可解释性不足,难以引入先验知识;偏重于学习特征和类别标签之间的二元关系,忽略了数据之间的多元关系;固定形状的卷积核的操作也限制了该方法性能的进一步提升。 本文面向计算机视觉,建立环境图像的图表征模型,结合图神经网络进行学习推理,与卷积神经网络方法进行对比,分析图表征及图神经网络的作用。本文按照图结构获取的自主程度,依次研究了 已知先验结构,生成图结构和可学习图结构三类图模型,图的节点从大到小的顺序依次为基于 物体,关键点和像素,这三类典型图表征模型分别对应了视觉导航、图像分类、语义分割三个计算机视觉领域的经典问题。通过理论和实验分析一方面验证了图表征模型方法在视觉任务中的优势:可以显式地表达图像数据中的拓扑信息,模型语义信息更加丰富、简洁,与向量特征相比增加了模型可解释性;另一方面探索了图神经网络处理图数据的能力:具有在大规模图上特征提取能力,具有在结构约束下进行特征学习的能力,可以作为概率图学习推理的一种计算方法。论文主要包含以下工作和贡献:
针对语义分割问题,提出了一种融合图神经网络的深度学习方法。结合深度卷积神经网络,该方法建立了以像素为节点,图像距离和语义关系作为图结构的图表征模型,提出了该表达下语义分割任务一般化问题形式。引入了自注意力机制的图结构,实现图像中语义图结构的学习。采用图神经网络求解图节点分类问题,缓解了深度卷积网络扩大感受野和保持局部特征之间的矛盾,是对现有基于深度卷积神经网络语义分割方法的优化。该方法被证明等价于在深度卷积网络学习的损失函数中增加了拉普拉斯约束,提供了图神经网络作用机理的新解释。 |
英文摘要 | Computer vision, one of the most important fields in artificial intelligence, aims to solve the problem that robots encounter in environment perception. Deep learning, especially deep convolutional network is one of the outstanding methods in this field. This thesis focuses on the relationships among data in the field of computer vision and builds the deep convolutional neural network-based graph representation models for the environment or the images which are inferred by the graph neural network. We compare our proposed models with the previous deep convolutional network and analyze the effectiveness of graph representation and the graph neural network. In this paper, the graph models with increasing autonomy of structure generation are studied, specifically the fixed graph structure, the graph structure obtained from the images and the learnable structure. This three typical graph models, the nodes of the graph from large to small in order based on objects, keypoints and pixels, which are correspond to three important computer vision tasks, namely, visual navigation, image classification and semantic segmentation. Then we utilize the graph neural network for solving these models. We verify the effectiveness of graph representation model in computer vision tasks which can explicit express the topological information, bring more semantic information and increase the interpretability of the model. The effectiveness of the proposed method verifies the superiority of the graph neural network in processing the graph data, the inference ability in the large-scale graph and the ability to extract the feature under structure constrains. The main contributions of this thesis are as follows. To address the vision navigation, we propose a navigation algorithm with Markov network as the environment model. Concerning about the uncertainty of object distribution in different environments, the algorithm builds a Markov network as the environment model with the objects as nodes and the joint probability of the adjacent object as the edges, which improves the generalization of the representation of the environment. The algorithm combines probability inference and structure learning by reinforcement learning, further improving the ability of the baseline method to adapt to different environments. We can concludethat the proposed Markov network is a formulation of maximum entropy reinforcement learning with the derivation process, so the proposed algorithm is an optimization of reinforcement learning in navigation task. |
关键词 | 图神经网络,图表达,图像分类,语义分割,视觉导航,概率图,深度 强化学习 |
语种 | 中文 |
七大方向——子方向分类 | 强化与进化学习 |
文献类型 | 学位论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/41619 |
专题 | 毕业生_博士学位论文 |
推荐引用方式 GB/T 7714 | 卢毅. 基于图神经网络的视觉图表达方法[D]. 中国科学院自动化所. 中国科学院大学,2020. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
1229博士学位论文-卢毅.pdf(32080KB) | 学位论文 | 限制开放 | CC BY-NC-SA |
个性服务 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
谷歌学术 |
谷歌学术中相似的文章 |
[卢毅]的文章 |
百度学术 |
百度学术中相似的文章 |
[卢毅]的文章 |
必应学术 |
必应学术中相似的文章 |
[卢毅]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论