CASIA OpenIR  > 毕业生  > 博士学位论文
文本向量表示方法研究
王少楠1,2
学位类型工学博士
导师宗成庆
2018-05-24
学位授予单位中国科学院研究生院
学位授予地点北京
关键词自然语言理解 语义表示 词汇表示 短语表示 句子表示
摘要
    文本表示指通过某种方式将自然语言文本编码为计算机可以处理的形式,这是实现自然语言理解最基础也是最重要的步骤。高质量的文本表示可以使计算机有效地完成各种自然语言相关的任务,如机器翻译、自动问答、人机对话等,因此开展这项研究具有重要的理论意义和应用价值。
 
    对文本表示模型来说,将不同类型信息进行有效地融合对获取高质量文本表示至关重要。本文围绕如何设计有效的信息融合方法来学习高质量的文本表示展开,重点关注三种类型信息的融合方法:词汇表示中多种模态信息的融合、短语或句子表示中底层词汇信息的融合、以及句子表示中字符与词汇信息的融合。另外,本文借鉴人脑语义表征研究的最新成果对分布式向量表示的语义可解释性进行了研究。
    
    论文的主要工作和创新点归纳如下:
 
    1. 提出了一种基于动态融合机制的多模态词汇表示方法
 
    词汇可由字符串、语音和图像等不同模态形式呈现,如何综合利用各模态信息学习更好的词汇表示是一个挑战。已有的多模态词汇表示模型平等地对待不同模态的信息,但是相关研究表明,不同模态的信息对于不同类型词汇含义的贡献程度是不同的。因此,我们提出了一种动态融合机制,通过自动针对不同类型的词汇学习不同模态的权重来对不同模态的信息进行有效地融合。实验表明,我们提出的方法有效地为不同类型词汇的文本和视觉模态信息赋予了不同的权重,显著提高了词汇表示的质量。在抽象词和具象词集合中,模型得到的权重符合认知科学的研究结论,即抽象词更依赖文本模态,而具象词的词义学习同时依赖文本和感知觉模态。
    
    2. 系统对比分析了不同因素对汉语和英语短语表示学习方法的影响
 
    短语表示通常由词汇表示组合得到,已有的短语表示学习方法主要关注如何选择合适的组合函数,而忽略了组合模型的其他重要环节,如词汇表示的质量、模型的训练目标等。因此,对于不同环节对模型性能的影响以及在何种条件下模型可以学到最优的短语表示,目前没有明确的结论。为此,我们通过大量的实验系统地比较了来自不同模型的词汇表示、组合函数、训练语料和目标函数对短语表示质量的影响。对比分析发现,词汇表示的质量和词汇信息的融合方法对短语向量表示质量的影响最大,在进行短语组合表示学习时应使用语义增强的词汇向量和形式简单的组合函数,在缺乏高质量的复述短语数据集的情况下,从文本中直接学到的短语向量可以作为一种有效的学习目标。另外,我们公开了用于汉语短语相似度计算研究的数据集,为汉语短语表示方法研究提供了重要的数据资源。
 
    3. 提出了一种受人类注意力机制启发的和基于字词融合机制的句子表示方法
 
    已有的句子表示模型对不同词汇的重要性并不做区分,而认知心理学研究表明,人在阅读句子时会选择性地注视或跳读某些词汇,这种注意力机制使人类对句子的阅读和理解变的更加高效。受此启发,我们提出了一种基于注意力机制的句子表示学习方法,该方法可自动对句子中重要的词汇赋予较高的权重,从而实现词汇级别信息的有效融合。实验表明,该方法显著提升了句子表示的质量,而且模型预测的词汇重要性分布在一定程度上与人的阅读时间分布相吻合,进一步证明了该方法的正确性。
 
    另外,汉语与英语不同,汉语的字包含了丰富的语义信息,而已有的句子表示学习方法并没有充分利用字的信息。为此,我们研究了汉字在学习通用句子表示中的作用,并提出了一种混合字词的网络结构,用于对汉字和词汇级别的信息进行有效地融合。在多种任务上的实验结果表明,我们提出的方法与已有的汉语句子表示方法相比具有明显的优势。并且,我们公开了用于汉语句子相似度计算研究的数据集。
 
    4. 提出了一种分布式语义向量表示可解释性分析的方法
 
    已有的词汇向量表示方法研究表明,多模态模型与单模态模型相比可以学到更好的词汇语义表示。但是,词汇的多模态表示中到底编码了什么信息,它们在哪些方面的效果优于单模态的模型,词汇在不同模态的语义组合过程中有什么区别和联系,以及不同类型的组合模型是如何组合词汇向量表示的,若干问题并没有得到清晰的解释。为此,我们对来自不同模型的词汇表示向量及其语义组合过程进行了深入研究,提出了一种分布式语义向量表示可解释性分析的方法,该方法利用大脑成分语义表征理论研究中对词汇语义维度的划分和数据,并利用表征相似性分析和空间映射方法对分布式向量表示的编码信息进行了解释,得出了多模态表示模型与文本表示模型相比更多地编码了感觉和运动属性,词汇在不同模态中具有相似的语义组合过程等结论。这些解释和结论对于进一步探索和建立更加有效的文本表示方法具有重要的指导意义。
其他摘要
    To understand the meaning of a sentence is a prerequisite to solve many linguistic and non-linguistic problems: translating the text into another language, answering a question, talking with a robot, and so on. Obviously, this requires a good representation of the meaning of a sentence. Therefore studying textual semantic representation has important theoretical significance and application value. 
 
    In order to obtain high-quality textual representation, it is crucial for textual representation models to effectively fuse different types of information. Therefore, this paper mainly investigates how to design effective fusion methods to learn high-quality vector representation. We focus on fusion methods of three types of information: fusing multimodal inputs in learning word representation, fusing word-level information in learning phrase or sentence representation, and fusing character-level and word-level information in learning phrase and sentence representations. In addition, this paper uses the latest research on brain semantic representation to study the interpretability of distributed vector representation. 
 
    The main contributions are summarized as follows: 
 
    1. Learning multimodal word representation via dynamic fusion methods
 
    Word meaning is related to multiple modality information like textual, visual, auditory and so on, and it is a big challenge to effectively fuse these information to obtain high-quality word representation. Almost all previous multimodal models typically treat the representations from different modalities equally. However, it is obvious that information from different modalities contributes differently to the meaning of words. This motivates us to build a multimodal model that can dynamically fuse the semantic representations from different modalities according to different types of words. To that end, we propose a novel dynamic fusion method to assign importance weights to each modality, which significantly improve the quality of multimodal word representation. In a set of abstract and concrete words, the weights obtained by the model are consistent with conclusions of cognitive science, i.e., the meaning of abstract words depends more on textual modality, while the meaning of concrete words depends on both textual and perceptual modality.
 
    2. Comparison study on critical components in composition model for phrase representation
 
    In general, phrase representation is composed of word representation.  Existing composition models focus on comparing different composition functions, ignoring other components of the composition model like the quality of word representation and training objectives. Therefore, the governing effects of different components in the composition model and their influences on model performance have not yet been fully investigated. To learn better phrase representation, this paper presents detailed comparisons concerning the effects of word vectors, composition function, training data and objective function used in a composition model for Chinese and English phrase representation. We find that the quality of word representation and the fusion method of word level information have the most essential effects on learning phrase representation, high-quality phrase representation can be obtained by using semantic enhanced word representation and simple composition function, representing the high frequency phrases by estimating their surrounding contexts is a good training objective. In addition, we release two Chinese phrase similarity datasets to evaluate how well the models measure the similarity of short phrases. 
 
    3. Learning sentence representation with guidance of human attention and with help of word internal structures
 
    Most of the existing sentence representation models typically treat each word in a sentence equally. In contrast, extensive studies have proven that human read sentences efficiently by making a sequence of fixation and saccades. This motivates us to improve sentence representations by assigning different weights to the vectors of the component words, which can be treated as an attention mechanism on single sentences. The extensive experiments demonstrate that the proposed methods significantly improve upon the state-of-the-art sentence representation models. Qualitative analyses have indicated that the proposed attention models can selectively focus on important words and successfully predict human reading times.   
 
    In contrast to English, a Chinese word is composed of characters, which contains rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. The extensive experiments demonstrate that our method achieves significant improvement over existing methods. In addition, we release a Chinese sentence similarity dataset. 
 
    4. Investigating inner properties of multimodal representation and semantic compositionality with brain-based componential semantics
 
    Multimodal models have been proven to outperform text-based approaches on learning semantic representations. However, it still remains unclear what properties are encoded in multimodal representations, in what aspects do they outperform the single-modality representations, and what happened in the process of semantic compositionality in different input modalities. To this end, we have conducted in-depth researches on different single and multimodal word representations and their semantic compositionality, and proposed a method for analyzing the semantic information encoded by these distributed representations. This method utilizes the fine-grained semantic division and dataset of brain componential semantic representation, and uses the representation similarity analysis and the spatial mapping method to interpret the information encoded by the distributed representation. Our results show that multimodal representation encodes more sensory and motion attributes compared with the textual representation, semantic compositionality is a general process which is irrespective of input modalities and so on, which have significant importance for further exploring and establishing more effective textual representation methods.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/20955
专题毕业生_博士学位论文
作者单位1.中国科学院自动化研究所
2.中国科学院大学
推荐引用方式
GB/T 7714
王少楠. 文本向量表示方法研究[D]. 北京. 中国科学院研究生院,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis-签字.pdf(6895KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王少楠]的文章
百度学术
百度学术中相似的文章
[王少楠]的文章
必应学术
必应学术中相似的文章
[王少楠]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。