深度学习中的实体关系学习方法及其应用研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 先进时空数据分析与学习

	深度学习中的实体关系学习方法及其应用研究
	常建龙
	2020-05-28
页数	108
学位类型	博士
中文摘要	近年来，深度神经网络在人工智能领域受到了学术界和企业界的广泛关注。作为神经网络应用部署的核心步骤，如何利用反向传播算法针对特定数据有效高效地训练神经网络是一个值得深入探讨的问题。特别地，在神经网络训练过程中有两个基本实体：数据和神经网络结构。其中，数据定义了神经网络待处理的任务，网络结构定义了处理该任务使用的函数形式。然而在现实应用中，由于各种原因对实体的观测是不充分的，比如无监督学习中数据的相似关系、非欧氏空间中数据的局部结构关系、神经网络训练过程中的拓扑连接关系等等。为了提升并拓展神经网络的研究和应用，本文将从三个方面深入研究基本实体的关系，即数据外部关系、数据内部关系和网络结构关系。综合来讲，论文的主要贡献为： 1. 提出深度自进化模型来建模分析数据样本之间的关系。为了让神经网络可以更好地处理无监督学习任务，文中将样本和样本之间的关系学习任务转换为一个成对样本的二分类问题，并通过自进化的有标签样本选择完成了神经网络的无监督训练过程。为了提高学习到特征表示的物理意义，深度自进化模型在特征学习中引入了聚类约束，以此来指导神经网络同时进行特征学习和聚类分析，极大提升并拓展了深度神经网络在无监督学习领域的研究和应用。 2. 提出结构感知卷积算子来根据数据的局部结构关系聚合数据。为了让深度卷积神经网络有能力处理嵌入在非欧氏空间中的数据，结构感知卷积将传统离散的有限维度的卷积核泛化成了连续的无限维度的单变量函数。根据函数拟合理论，结构感知卷积不仅可以参数化为一组基函数的加权求和，还可以将加权系数在整个空域共享。由于结构感知卷积的可微性，结构感知卷积网络可以通过反向传播算法进行训练，来同时处理欧氏空间和非欧氏空间中的数据。 3. 提出局部聚合图网络模型，引入集合函数来实现集合数据的局部聚合，并通过泛化后的Kolmogorov-Arnold表示定理来构建统一形式的集合函数。理论上，表示定理中的内部函数和外部函数被分解为基函数的线性组合，所学的基函数系数可以实现在全局空间内共享，使得局部聚合图网络可以用一组共享的参数学习集合数据的高层特征表示。实际中用Chebyshev、Legendre、Laguerre、Hermite基函数在集合数据上验证了局部聚合图网络处理集合数据的有效性。 4. 提出基于集成gumbel-softmax的网络结构搜索方法，通过可微的方式来学习网络结构组件之间的拓扑连接关系和网络构搜索的高效离散决策。集成的gumbel-softmax可通过一种可微的方式执行离散决策，这不仅可以用来选择层与层之间的函数操作，还可以通过反向传播算法进行网络结构学习。实验表明集成的gumbel-softmax可以有效且高效地学习高性能的卷积网络和递归网络，在提升模型性能的前提下大幅度减少了网络训练过程中的人为干预。
英文摘要	Recently, deep neural network, a popular model utilized in artificial intelligence, has achieved extensive attentions from academia and business. As a core step in applying deep networks, how to effectively and efficiently train deep networks with the back-propagation algorithm on specific tasks is a significative problem. Specifically, there are two basic entities in the training process of deep networks, i.e., data and network architectures, which define tasks and learnable functions in deep networks, respecyively. However, attributes of the entities may be unobservable in practice, i.e., similarities between samples in unsupervised learning, local structures in non-Euclidean domains, and connection relationships in networks. In order to imporve the generalization of deep netwroks, three relations between entities are investigated in this work, i.e., the outer relationship in data, the inner relationship in data, and the relationship in network architectures. To sum up, the main contributions of this work are: 1. By modeling the outer relations in data, deep self-evolution model is developed to eliminate the deficiency of deep networks in dealing with the unsupervised tasks. For this purpose, the outer relationship between data is recast as a binary pairwise-classification problem to estimate whether pairwise patterns are similar. To learn informative representations, clustering constraints are introduced in the deep self-evolution model to represent specific concepts with specific representations, expanding the research and application of deep networks in the unsupervised learning field. 2. To handle the non-Euclidean structured data with diverse local topological structures, structure-aware convolution is proposed to establishe structure-aware convolutional neural networks. Specifically, filters in the structure-aware convolution are generalized to univariate functions, which are capable of aggregating local inputs with diverse topological structures. By taking advantage of the function approximation theory, such filters can be parameterized with sharable parameters. Since all the operations are differentiable, the networks can be trained end-to-end by the standard back-propagation, broadening the reach of deep networks from Euclidean to non-Euclidean domains. 3. To manage the non-Euclidean data without local structures, i.e., set data, local-aggregation function is developed to establishe local-aggregation graph networks in the context of the Kolmogorov-Arnold theorem. Theoretically, the local-aggregation function consists of an inner function and an outer function, which can be parameterized with a set of orthonormal polynomials (e.g., Chebyshev, Legendre, Laguerre, Hermite basis functions) in an effective and efficient manner. In practice, such local-aggregation function is in a position to aggregate permutation-unordered and dimension-unequal local inputs on the non-Euclidean domains. 4. Ensemble gumbel-softmax is presented to automatically learn realtionships between layers and approximate architectures during searching and validating in a differentiable manner. Technically, the ensemble gumbel-softmax estimator consists of a group of gumbel-softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients from binary codes to probability vectors. Benefiting from such modeling, in the process of training deep networks, architectures and network weights can be jointly optimized with the standard back-propagation. Extensive experiments evidence that our method suffices to discover high-performance convolutional networks and recurrent networks, with fewer manual design.
关键词	深度神经网络, 实体关系, 无监督学习, 图卷积网络, 结构搜索
语种	中文
七大方向——子方向分类	机器学习
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/39119
专题	多模态人工智能系统全国重点实验室_先进时空数据分析与学习
推荐引用方式 GB/T 7714	常建龙. 深度学习中的实体关系学习方法及其应用研究[D]. 中国科学院自动化研究所. 中国科学院大学,2020.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
常建龙毕业论文.pdf（10849KB）	学位论文		开放获取	CC BY-NC-SA