CASIA OpenIR  > 模式识别国家重点实验室  > 自然语言处理
Empirical Exploring Word-Character Relationship for Chinese Sentence Representation
Wang, Shaonan1; Zhang, Jiajun1; Zong, Chengqing2
2018-05-01
发表期刊ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
卷号17期号:3页码:14
文章类型Article
摘要This article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters, which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based and word-based models.
其他摘要本文讨论了组成汉语句子表示的学习问题,它通过构成句子的意义来表示句子的意义。与英语相比,汉字是由字符组成的,其中包含丰富的语义信息。然而,现有方法没有充分利用这一信息。在这篇文章中,我们提出了一种新的基于字词融合机制的句子表示模型,利用丰富的内部单词语义信息来改进汉语句子表示。我们提出了两种新的策略来达到这个目的。第一个是在汉字上使用一个掩码门,学习一个单词中汉字之间的关系。第二种方法是使用单词上的最大池操作来自适应地找到原子和组合词表示的最佳混合。最后,将所提出的混合字词结构应用于各种句子组合模型,该模型在句子相似度任务上比基线模型有显著的性能提升。为了进一步验证我们的模型的泛化能力,我们将学到的句子表示应用在句子分类任务、问题分类任务,和句子蕴涵的任务,结果表明,所提出的基于字词融合机制的句子表示模型优于基于字符和基于词的模型。
关键词Sentence Representation Compositionmodel Inner-word Character Mixed Character-word Representation Mask Gate Max Pooling
WOS标题词Science & Technology ; Technology
DOI10.1145/3156778
关键词[WOS]Sentence representation ; composition model ; inner-word character ; mixed character-word representation ; mask gate ; max pooling
收录类别SCI
语种英语
项目资助者Natural Science Foundation of China(61673380 ; 61403379)
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence
WOS记录号WOS:000433090800001
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/20678
专题模式识别国家重点实验室_自然语言处理
作者单位1.Univ Chinese Acad Sci, Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Intelligence Bldg,498 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Chinese Acad Sci, CAS Ctr Excellence Brain Sci & Intelligence Techn, Natl Lab Pattern Recognit,Inst Automat, Intelligence Bldg,498 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Wang, Shaonan,Zhang, Jiajun,Zong, Chengqing. Empirical Exploring Word-Character Relationship for Chinese Sentence Representation[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2018,17(3):14.
APA Wang, Shaonan,Zhang, Jiajun,&Zong, Chengqing.(2018).Empirical Exploring Word-Character Relationship for Chinese Sentence Representation.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,17(3),14.
MLA Wang, Shaonan,et al."Empirical Exploring Word-Character Relationship for Chinese Sentence Representation".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 17.3(2018):14.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
TALIIP-Empirical Exp(2098KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Wang, Shaonan]的文章
[Zhang, Jiajun]的文章
[Zong, Chengqing]的文章
百度学术
百度学术中相似的文章
[Wang, Shaonan]的文章
[Zhang, Jiajun]的文章
[Zong, Chengqing]的文章
必应学术
必应学术中相似的文章
[Wang, Shaonan]的文章
[Zhang, Jiajun]的文章
[Zong, Chengqing]的文章
相关权益政策
暂无数据
收藏/分享
文件名: TALIIP-Empirical Exploring Word-Character Relationship for Chinese Sentence Representation.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。