领域跨媒体知识表达与推理技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	领域跨媒体知识表达与推理技术研究
	张莹莹
	2022-06
页数	160
学位类型	博士
中文摘要	随着信息感知技术的迅猛发展，各行业领域累积了大量的日志、文本、图像、音乐、视频等多媒体数据。跨媒体既表现为复杂媒体对象混合并存，又表现为各类媒体对象形成复杂的关联关系和组织结构。如何将非结构化的多模态数据转化为结构化的知识，是极具挑战的问题。同时，将知识信息融入到推理过程中可以增强推理算法的可靠性、可解释性与可用性。因此，开展领域跨媒体知识表达与推理应用研究具有重要的理论意义与应用价值。领域跨媒体数据具有多模态、跨媒体异构的特点，为其表达与推理研究带来挑战。传统的数据表示方法难以建模数据中实体、关系、事理、常识等知识要素，因此无法实现知识的深度理解。领域知识图谱对大规模数据之间的关系有很强的表达能力。使用领域知识图谱既能进行高效的信息检索，又能进行推理、挖掘隐含知识，开展药物推荐、智能导医等相关应用研究。本研究围绕如何设计有效的方法来表示和推理领域跨媒体数据展开，在医学健康领域，就表达与推理两个层次展开研究。领域跨媒体知识表达是推理的基础。在表达方面，针对多模态异构性与高阶事理结构对数据中的知识进行建模。在推理方面，研究如何将学到的知识融入到推理应用中，提高推理应用的可靠性、可解释性与可用性。论文的主要工作与创新点归纳如下：一、研究多模态领域知识图谱表示学习方法。医学知识图谱的表示学习是疾病诊断、医疗问答等智能医疗应用的基础。医学知识图谱中的实体可以携带非结构化的多模态内容，例如图像和文本。同时，知识图谱中实体邻居节点的数目差异较大，实体和关系的类型也十分丰富。为了刻画多模态知识图谱中丰富的信息，本文引入生成对抗网络，有效利用实体的多模态信息，并缓解知识图谱中的模态缺失问题。本文设计了多关系特征聚合网络，缓解了图稀疏性问题，有效利用知识图谱复杂结构信息。多关系特征聚合网络通过层次注意力机制有选择地聚合邻居节点信息，刻画了医疗知识图谱中复杂的结构信息与丰富的语义信息，获取知识图谱的丰富语义特征。二、研究高阶事件图谱的建模。传统知识图谱中仅包含实体之间的静态事实联系，这不足以使模型学习到丰富全面的实体表示。而知识不仅包含静态事实和属性，而且需要关注与实体关联的动态事件。知识图谱中的三元组仅涉及两个实体，而事件能够同时捕获多个实体之间的关系。为了刻画高阶事件之间的时序关系，进一步增强知识图谱的表达能力，本文引入了门控图神经网络，充分刻画事件图谱中事件之间多跳依赖关系。本文定义了一个医学事件预测任务来验证事件表示的有效性。同时，本文使用跨模态患者个人信息，进一步提升个性化事件预测能力。三、研究知识联合的可靠药物推荐方法。药物推荐系统旨在根据一组症状推荐一组特定的药物。为了在稀疏数据上推荐药物，提升推理的可靠性，本文提出一种多任务学习方法，通过共享知识图谱表示任务与药物推荐任务的隐含知识，指导任务专用模型建模药物与症状之间的关联。本文构建了药物属性图，在分子结构、药物类别等层面挖掘药物之间内部的相似性，进一步提升推荐效果。通过对医学知识图谱与药物属性图的建模，获得症状与药物的丰富表示，缓解了药物推荐中遇到的数据稀疏问题。在多任务框架下，迭代训练知识图谱嵌入模型和药物推荐模型，使得两个任务性能都得到了提升。四、研究知识感知的可解释性医疗问答方法。可解释性和准确性是医疗问答的两个主要关注点。现有的方法集中在准确性上，不能为检索到的回答提供合理解释。为了在提高准确性的基础上，提升回答的可解释性，本文提出了基于多模态知识感知注意力机制的医疗问答方法，融合领域知识，挖掘问答对之间基于医学知识图谱语义路径的潜在交互信息，提高回答检索的准确性。同时提出了一种层次注意力机制，计算候选路径的重要程度，模拟专业回答者推理过程，为检索到的回答提供可解释性，提高了回答的可信度。五、研究知识增强的自然语言回复生成方法。自然语言生成已成为对话系统中的一项基本任务。基于循环神经网络的自然语言回复生成方法编码对话上下文，并将其解码为回复。这样的方法易生成单一的回复。本文提出了互注意力模块，通过计算上下文间的注意力矩阵，建模对话上下文的重要性差异，为生成上下文相关的有意义的回复提供基础。同时，本文提出了知识感知条件自编码器，同时对显式语意（上下文中的单词）与隐式常识（会话历史相关的外部知识）进行建模，生成丰富、不重复的回复。
英文摘要	With the rapid development of information perception technology, a large number of multimedia data such as logs, texts, images, music, and videos have accumulated in industries.Cross-media is not only manifested as the mixing and coexistence of complex media objects, but also as the formation of complex relationships and organizational structures of various media objects.How to translate unstructured multimodal data into structured knowledge is a challenging problem. At the same time, the integration of knowledge information into the reasoning process can enhance the reliability, explainability, and usability of the reasoning application. Therefore, it is of great theoretical significance and practical value to research the representation learning and reasoning application of domain multi-modal knowledge graph. Domain cross-media data has the characteristics of multi-modal and cross-media heterogeneity, which brings challenges to cross-media data representation learning and reasoning. Traditional data representation method is difficult to model the knowledge elements such as entities, relationships, facts, and common sense in the data, so it is impossible to achieve a deep understanding of knowledge. Domain knowledge graphs have a strong ability to express relationships between large-scale data.The use of domain knowledge graphs enables efficient information retrieval. It can also excavate implicit knowledge, and carry out related research such as medicine recommendation and intelligent medical guidance. This thesis revolves around how to design effective ways to represent and apply domain cross-media data. Research is carried out at two levels: representation learning and reasoning. Representation learning level models the knowledge graph with multimodal heterogeneity and high-order structure. Reasoning level focuses on how to integrate the learned knowledge into the reasoning application, and improve the reliability, explainability and usability of the reasoning application. The contributions of the thesis are summarized as follows: 1. Multi-modal domain knowledge graph representation learning. The representation learning of medical knowledge graphs is the basis for intelligent medical applications such as disease diagnosis and medical question answering. Entities in the medical knowledge graph can carry unstructured, multimodal content, such as images and text. At the same time, the number of entities' neighbors in the knowledge graph varies greatly, and the types of entities and relationships are also various. In order to portray the multimodal and heterogeneity of the medical knowledge graph, this thesis introduces the generative adversarial network, which effectively utilizes the multimodal information in social media content, alleviating the missing modal problem in the knowledge graph. At the same time, the multi-relational feature aggregation network is proposed to ease the sparsity problem of the graph and utilize the heterogeneity of the knowledge graph structure. The multi-relational feature aggregation network selectively aggregates neighbor nodes' information through a hierarchical attention mechanism, which better characterizes the complex structural information and rich semantic information in the medical knowledge graph, and obtains the rich semantic features represented by the knowledge graph. 2. High-order event knowledge graph representation learning. The knowledge graph is mainly a static factual connection between entities, which is not enough for learning a rich and comprehensive representation of the entity. In the case of humans understanding a real-world entity, people not only consider its static facts and properties, but also need to focus on dynamic events associated with the entity. There is a huge amount of event information in the world, and it convey dynamic knowledge. A triplet in a knowledge graph involves only two entities, while events are able to capture relationships between multiple entities at the same time. In order to portray the temporal relationships of high-order events, and enhance the expression ability of knowledge graph, this thesis introduces a gated graph neural network. The gated graph neural network fully characterizes the multi-hop dependencies in event knowledge graph. This thesis defines a medical event prediction task to verify the validity of the learned embedding of the event graph, and uses cross-modal patient personal information to further improve personalized event prediction capabilities. 3. Knowledge-enhanced reliable medicine recommendation. The medicine recommendation system is designed to recommend a specific set of medications based on a set of symptoms. In order to recommend medicines on sparse data and improve the reliability of reasoning, this thesis proposes a multi-task learning method. By sharing the implicit knowledge of the knowledge graph representation task and drug recommendation task, it guides the task-specific model to model the relationship between medicine and symptoms. Through the modeling of medical knowledge graph and medicine attribute graph, a rich representation of symptoms and medicines is obtained, and the problem of data sparseness encountered in medicine recommendation is alleviated. At the same time, this thesis uses multi-task framework to carry out the joint learning of knowledge graph embedding and medicine recommendation. The performance of both tasks is improved through iterative training of two tasks. 4. Explainable medical question answering based on the knowledge graph. Accuracy and explainability are the two main challenges of medical question answering. However, existing methods focus primarily on accuracy and do not provide a good explanation for the retrieved answers. In order to improve the explanatory nature of the retrieved answers on the basis of improving the accuracy, this thesis proposes a multimodal knowledge perception attention mechanism for medical question answering, which integrates domain knowledge and excavates the potential interaction information based on medical knowledge graph between question-and-answer pairs to improve the accuracy of answer retrieval. At the same time, a hierarchical attention mechanism is proposed to calculate the importance of candidate paths, providing explainabilityand improving the credibility of the retrieved answers. This thesis also constructs two real-world medical question answering datasets. 5. Knowledge-enhanced natural language response generation. Natural language response generation has become a fundamental task in conversational systems. Recurrent neural networks based natural language reply generation methods encode the conversation context and decode it into a response. However, they tend to produce a single reply. This thesis proposes a co-attention module for calculating the attention matrix, modeling the importance difference of the conversation context, and providing a basis for generating context-sensitive meaningful replies. At the same time, this thesis proposes a knowledge-aware conditional autoencoder, while modeling explicit semantics (words in context) and implicit common sense (external knowledge related to conversation history) to generate rich, non-repetitive replies.
关键词	知识图谱表示学习多模态推理
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48864
专题	毕业生_博士学位论文多模态人工智能系统全国重点实验室_多媒体计算
推荐引用方式 GB/T 7714	张莹莹. 领域跨媒体知识表达与推理技术研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
10 张莹莹_答辩后修改版论文.pdf（13493KB）	学位论文		限制开放	CC BY-NC-SA