基于图神经网络的视觉图表达方法

CASIA OpenIR > 毕业生 > 博士学位论文

	基于图神经网络的视觉图表达方法
	卢毅
	2020-12-03
页数	130
学位类型	博士
中文摘要	计算机视觉目的是解决智能体通过图像输入感知世界的问题，是人工智能的重要研究方向。深度学习特别是深度卷积神经网络方法是计算机视觉领域现在最常用的方法之一。但该方法仍然存在一些局限：方法可解释性不足，难以引入先验知识；偏重于学习特征和类别标签之间的二元关系，忽略了数据之间的多元关系；固定形状的卷积核的操作也限制了该方法性能的进一步提升。本文面向计算机视觉，建立环境图像的图表征模型，结合图神经网络进行学习推理，与卷积神经网络方法进行对比，分析图表征及图神经网络的作用。本文按照图结构获取的自主程度，依次研究了已知先验结构，生成图结构和可学习图结构三类图模型，图的节点从大到小的顺序依次为基于物体，关键点和像素，这三类典型图表征模型分别对应了视觉导航、图像分类、语义分割三个计算机视觉领域的经典问题。通过理论和实验分析一方面验证了图表征模型方法在视觉任务中的优势：可以显式地表达图像数据中的拓扑信息，模型语义信息更加丰富、简洁，与向量特征相比增加了模型可解释性；另一方面探索了图神经网络处理图数据的能力：具有在大规模图上特征提取能力，具有在结构约束下进行特征学习的能力，可以作为概率图学习推理的一种计算方法。论文主要包含以下工作和贡献：针对视觉导航问题，提出了一种基于马尔可夫网络的视觉导航算法。该方法充分考虑了观测导致的不确定性，构建了以物体为节点，知识图谱为先验结构的马尔可夫网络环境模型，提高了环境表达的泛化能力。该方法将基于图神经网络的概率推理和结构学习两部分融合到了强化学习过程中，为部分可观测的马尔可夫决策问题提供了一种新的基于图神经网络进行推断的图模型方法。从实验角度，验证了该方法具有良好的泛化性能，也从理论上论证了该方法中的马尔可夫网是一个最大熵的强化学习算法，是对已有基于强化学习导航算法的优化。针对图像分类问题，提出了一种基于图神经网络的相似性度量学习方法，这一新方法是一种面向图像结构的模式识别方法。该方法构建了基于关键点及关键点图像位置关系的图表征模型，提供了一种简洁有效的图像图表征方式。采用具有暹罗结构的图神经网络方法，对比学习图距离函数，实现图像分类，该步骤充分利用了类别的拓扑先验信息。实验验证了包含拓扑信息的图表征可以提升图像的分类性能，是一种简洁有效的图像特征，也验证了图神经网络利用先验结构推理的有效性。针对语义分割问题，提出了一种融合图神经网络的深度学习方法。结合深度卷积神经网络，该方法建立了以像素为节点，图像距离和语义关系作为图结构的图表征模型，提出了该表达下语义分割任务一般化问题形式。引入了自注意力机制的图结构，实现图像中语义图结构的学习。采用图神经网络求解图节点分类问题，缓解了深度卷积网络扩大感受野和保持局部特征之间的矛盾，是对现有基于深度卷积神经网络语义分割方法的优化。该方法被证明等价于在深度卷积网络学习的损失函数中增加了拉普拉斯约束，提供了图神经网络作用机理的新解释。
英文摘要	Computer vision, one of the most important fields in artificial intelligence, aims to solve the problem that robots encounter in environment perception. Deep learning, especially deep convolutional network is one of the outstanding methods in this field. Although there are several limitations in the deep convolutional network. For example, the network is usually less of robustness that the wrong decision may be arised from tiny noise; the method focuses on the relationship between features and the labels, which ignores the relation among the features and labels; the fixed size and shape of convolutional kernel also limit the improving in the method performance. This thesis focuses on the relationships among data in the field of computer vision and builds the deep convolutional neural network-based graph representation models for the environment or the images which are inferred by the graph neural network. We compare our proposed models with the previous deep convolutional network and analyze the effectiveness of graph representation and the graph neural network. In this paper, the graph models with increasing autonomy of structure generation are studied, specifically the fixed graph structure, the graph structure obtained from the images and the learnable structure. This three typical graph models, the nodes of the graph from large to small in order based on objects, keypoints and pixels, which are correspond to three important computer vision tasks, namely, visual navigation, image classification and semantic segmentation. Then we utilize the graph neural network for solving these models. We verify the effectiveness of graph representation model in computer vision tasks which can explicit express the topological information, bring more semantic information and increase the interpretability of the model. The effectiveness of the proposed method verifies the superiority of the graph neural network in processing the graph data, the inference ability in the large-scale graph and the ability to extract the feature under structure constrains. The main contributions of this thesis are as follows. To address the vision navigation, we propose a navigation algorithm with Markov network as the environment model. Concerning about the uncertainty of object distribution in different environments, the algorithm builds a Markov network as the environment model with the objects as nodes and the joint probability of the adjacent object as the edges, which improves the generalization of the representation of the environment. The algorithm combines probability inference and structure learning by reinforcement learning, further improving the ability of the baseline method to adapt to different environments. We can concludethat the proposed Markov network is a formulation of maximum entropy reinforcement learning with the derivation process, so the proposed algorithm is an optimization of reinforcement learning in navigation task. For the image classification problem, we propose a graph neural network-based distance metric learning method. This method utilizes a graph representation model by the keypoints which offers a concise and effective graph representation for an image. Utilizing the graph distance function for image classification, the method is a kind of structure pattern perception method which offers a new idea for introducing the topological information in image processing. The experiments verify the effectiveness of graph representation including topological information and the graph neural network work with a prior structure of a knowledge graph. To solve the image semantic segmentation problem, we propose a graph-based model initialized by deep convolutional neural network. Combined with the deep convolutional neural network, a graph model is built with the pixels as nodes, the distance and semantic relationship as edges, which transforms pixels classification into nodes classification. Our method utilizes the self-attention mechanism to build the graph, which extracts the feature in a flexible respective field and this process makes it possible for the model to combine structure learning and feature extracting. The graph neural network is introduced in semantic segmentation, which makes up the limitations of deep convolutional network in lacking overall structural information and solves the conflict between increasing receptive field and keeping location information. We prove that the graph module in our models takes the same role as a Laplacian regularization term in image segmentation, which offers new interpretation for the function of the graph neural network especially for non-spectral graph neural network.
关键词	图神经网络，图表达，图像分类，语义分割，视觉导航，概率图，深度强化学习
语种	中文
七大方向——子方向分类	强化与进化学习
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/41619
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	卢毅. 基于图神经网络的视觉图表达方法[D]. 中国科学院自动化所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
1229博士学位论文-卢毅.pdf（32080KB）	学位论文		限制开放	CC BY-NC-SA