CASIA OpenIR  > 模式识别国家重点实验室  > 自然语言处理
Empirical Exploring Word-Character Relationship for Chinese Sentence Representation
Wang, Shaonan1; Zhang, Jiajun1; Zong, Chengqing2
Source PublicationACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
2018-05-01
Volume17Issue:3Pages:14
SubtypeArticle
AbstractThis article addresses the problem of learning compositional Chinese sentence representations, which represent the meaning of a sentence by composing the meanings of its constituent words. In contrast to English, a Chinese word is composed of characters, which contain rich semantic information. However, this information has not been fully exploited by existing methods. In this work, we introduce a novel, mixed character-word architecture to improve the Chinese sentence representations by utilizing rich semantic information of inner-word characters. We propose two novel strategies to reach this purpose. The first one is to use a mask gate on characters, learning the relation among characters in a word. The second one is to use a max-pooling operation on words to adaptively find the optimal mixture of the atomic and compositional word representations. Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task. To further verify the generalization ability of our model, we employ the learned sentence representations as features in sentence classification task, question classification task, and sentence entailment task. Results have shown that the proposed mixed character-word sentence representation models outperform both the character-based and word-based models.
Other Abstract本文讨论了组成汉语句子表示的学习问题,它通过构成句子的意义来表示句子的意义。与英语相比,汉字是由字符组成的,其中包含丰富的语义信息。然而,现有方法没有充分利用这一信息。在这篇文章中,我们提出了一种新的基于字词融合机制的句子表示模型,利用丰富的内部单词语义信息来改进汉语句子表示。我们提出了两种新的策略来达到这个目的。第一个是在汉字上使用一个掩码门,学习一个单词中汉字之间的关系。第二种方法是使用单词上的最大池操作来自适应地找到原子和组合词表示的最佳混合。最后,将所提出的混合字词结构应用于各种句子组合模型,该模型在句子相似度任务上比基线模型有显著的性能提升。为了进一步验证我们的模型的泛化能力,我们将学到的句子表示应用在句子分类任务、问题分类任务,和句子蕴涵的任务,结果表明,所提出的基于字词融合机制的句子表示模型优于基于字符和基于词的模型。
KeywordSentence Representation Compositionmodel Inner-word Character Mixed Character-word Representation Mask Gate Max Pooling
WOS HeadingsScience & Technology ; Technology
DOI10.1145/3156778
WOS KeywordSentence representation ; composition model ; inner-word character ; mixed character-word representation ; mask gate ; max pooling
Indexed BySCI
Language英语
Funding OrganizationNatural Science Foundation of China(61673380 ; 61403379)
WOS Research AreaComputer Science
WOS SubjectComputer Science, Artificial Intelligence
WOS IDWOS:000433090800001
Citation statistics
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/20678
Collection模式识别国家重点实验室_自然语言处理
Affiliation1.Univ Chinese Acad Sci, Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Intelligence Bldg,498 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
2.Univ Chinese Acad Sci, Chinese Acad Sci, CAS Ctr Excellence Brain Sci & Intelligence Techn, Natl Lab Pattern Recognit,Inst Automat, Intelligence Bldg,498 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
Recommended Citation
GB/T 7714
Wang, Shaonan,Zhang, Jiajun,Zong, Chengqing. Empirical Exploring Word-Character Relationship for Chinese Sentence Representation[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2018,17(3):14.
APA Wang, Shaonan,Zhang, Jiajun,&Zong, Chengqing.(2018).Empirical Exploring Word-Character Relationship for Chinese Sentence Representation.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,17(3),14.
MLA Wang, Shaonan,et al."Empirical Exploring Word-Character Relationship for Chinese Sentence Representation".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 17.3(2018):14.
Files in This Item: Download All
File Name/Size DocType Version Access License
TALIIP-Empirical Exp(2098KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang, Shaonan]'s Articles
[Zhang, Jiajun]'s Articles
[Zong, Chengqing]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang, Shaonan]'s Articles
[Zhang, Jiajun]'s Articles
[Zong, Chengqing]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang, Shaonan]'s Articles
[Zhang, Jiajun]'s Articles
[Zong, Chengqing]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: TALIIP-Empirical Exploring Word-Character Relationship for Chinese Sentence Representation.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.