CASIA OpenIR  > 数字内容技术与服务研究中心  > 听觉模型与认知计算
引入物理环境信息的问答技术研究
姚轶群
Subtype博士
Thesis Advisor徐波 ; 许家铭
2019-06
Degree Grantor中国科学院大学
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword自然语言理解 物理环境 视觉 问答 推理
Abstract
自然语言理解是人工智能的一个重要的子领域。人类的语言来源于与外界物理环境之间的多方式、多层次的交互。物理环境包含四个必要因素:实体、属性、关系、约束,而这些要素与自然语言的产生和理解有着密切的关系。因此,探究智能体对物理环境信息的利用方式,从而对智能体与物理环境的交互过程进行模拟,对于提升智能体处理语言的能力而言至关重要。
本文对于在语言理解过程中引入物理环境信息的三类方法:信息融合、分布习得、概念习得进行了由浅入深的分析,提出了对应的语言理解模型,并根据智能体与物理环境交互方式的不同,分别通过基于监督学习、对抗学习和强化学习的训练算法对交互过程进行一定程度的模拟,使模型能够有效地利用物理环境中的信息进行语言理解。
智能问答任务作为图灵测试的一种实现形式,是衡量智能体语言能力的重要指标。本文在多种形式的问答任务中,通过对所提出的三类引入物理环境信息的方法进行实验测试,在改善了多种问答系统性能的同时,揭示了不同形式的物理环境信息在自然语言理解中的作用。
本文的主要创新点包括:
(1) 创新性地提出了一种基于级联互调制的视觉问答框架,该框架包含视觉信息和语言信息之间的多步的融合与调制,解决了现有框架中缺乏一个多步的、受视觉信息调控的“语言理解程序”的缺陷。该框架能够在回答问题的过程中,通过多步的显式和隐式注意力转移,在问句和图片中寻找正确的关注点和线索,提升推理性视觉问答任务的准确率。
(2) 创新性地提出了一种基于语言-环境对抗的多模态编码方法。该方法通过生成-对抗学习,将对话系统对语言的编码限定在真实图片的向量空间中,使来自语言、图片两个模态的信息具有相近的分布,从而在不同模态中共享泛用的、非监督的知识,同时也改善了模型的数值性质。该辅助训练方法与本文创新性地提出的基于注意力的样本选择机制相结合,能够提升问答式视觉对话系统所产生响应的准确率和语句质量,具有鲁棒和高效的特征。
(3) 创新性地将认知语言学的意象图式理论引入自然语言理解模型,将物理环境中的刚性约束建模为逻辑关系式,提出了一种全新的、进行多步显式推理的神经符号系统。该系统通过与由“变量”和“关系”构成的虚拟物理环境的交互,能够在阅读文本的同时激活与人类认知活动较为接近的推理过程,解决了现有推理问答模型的中间过程可解释性不足和过度依赖数据量的问题,提升了涉及复杂逻辑关系的人造推理问答任务的准确率。
Other Abstract
 
Natural language understanding is an important field in artificial intelligence. An key nature of human natural language is its interaction with the physical environment in multiple aspects and levels. In our definition, the physical environment has four main elements: entities, attributes, relations and constraints, which all have dense connections with the emergence and understanding of natural language. Therefore, a study on how an artificial agent can use the information from physical environment, and how to simulate its interaction with the environment, can potentially improve its language processing ability.
In this paper, we conduct in-depth analysis into three methods of incorporating physical environment natural language understanding, and propose corresponding language understanding models. We train our models via supervised, adversarial and reinforcement learning, depending on different types of interactions with the physical environment.
Question answering is an implement of the concept of Turing Test, and thus an important evaluation benchmark of language understanding. We test our three proposed models on multiple types of question answering tasks, revealing how different information from physical environment influence language understanding, as well as improving the performance of multiple kinds of question answering systems.
The main contributions of this paper include:
(1) We propose a novel framework: Cascaded Mutual Modulation for visual reasoning. This framework enables multi-step fusions and modulations between visual information and language information, solving the problem that existing methods don't have a visual-modulated ``language understanding program''. While answering a question, our model can focus the correct clues in the question and image with multi-step attention shifts. Our model improves the accuracy on visual reasoning tasks.
(2) We propose a novel auxiliary training method based on language-environment adversarial learning, for visual dialog tasks. This method constrains a model's encoding of images and sentences to be vectors in strongly-connected distributions, enabling a model to learn unsupervised general knowledge from both modalities. Combined with our proposed attention-based sample selecting technique, we improve the correctness and fluency of the generated responses in visual dialog systems.
(3) We propose a novel neural-symbolic system that explicitly performs step-by-step reasoning. We incorporate the theory of Image Schema from cognitive linguistics into natural language understanding and model the rigid constraints in physical environment as logic rules. Our proposed system interacts with a virtual physical environment equipped with variables, abstract relations and logic rules. It activates human-like reasoning process while reading textual inputs, solves the interpretation and data-thirsty problems in existing textual question answering systems, and improves the answering accuracy on a synthetic textual reasoning benchmark.
Pages104
Funding ProjectNational Natural Science Foundation of China[61602479]
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/23874
Collection数字内容技术与服务研究中心_听觉模型与认知计算
Recommended Citation
GB/T 7714
姚轶群. 引入物理环境信息的问答技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2019.
Files in This Item:
File Name/Size DocType Version Access License
Thesis.pdf(2023KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[姚轶群]'s Articles
Baidu academic
Similar articles in Baidu academic
[姚轶群]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[姚轶群]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.