CASIA OpenIR  > 数字内容技术与服务研究中心  > 听觉模型与认知计算
Semi-supervised Chinese Word Segmentation based on Bilingual Information
Chen W(陈炜); Xu B(徐波); Chen,Wei
2015-09
Conference NameProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Source PublicationProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Conference Date2015-9
Conference PlaceLisbon, Portugal
Abstract

This paper presents a bilingual semi- supervised Chinese word segmentation (CWS) method that leverages the nat- ural segmenting information of English sentences. The proposed method in- volves learning three levels of features, namely, character-level, phrase-level and sentence-level, provided by multiple sub- models. We use a sub-model of condi- tional random fields (CRF) to learn mono- lingual grammars, a sub-model based on character-based alignment to obtain ex- plicit segmenting knowledge, and anoth- er sub-model based on transliteration sim- ilarity to detect out-of-vocabulary (OOV) words. Moreover, we propose a sub-model leveraging neural network to ensure the proper treatment of the semantic gap and a phrase-based translation sub-model to s- core the translation probability of the Chi- nese segmentation and its corresponding English sentences. A cascaded log-linear model is employed to combine these fea- tures to segment bilingual unlabeled data, the results of which are used to justify the original supervised CWS model. The eval- uation shows that our method results in su- perior results compared with those of the state-of-the-art monolingual and bilingual semi-supervised models that have been re- ported in the literature. 


KeywordChinese Word Segmentation Semi-supervised Bilingual
Indexed ByEI
Document Type会议论文
Identifierhttp://ir.ia.ac.cn/handle/173211/11801
Collection数字内容技术与服务研究中心_听觉模型与认知计算
Corresponding AuthorChen,Wei
Affiliation中国科学院自动化研究所
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Chen W,Xu B,Chen,Wei. Semi-supervised Chinese Word Segmentation based on Bilingual Information[C],2015.
Files in This Item: Download All
File Name/Size DocType Version Access License
1.pdf(271KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Chen W(陈炜)]'s Articles
[Xu B(徐波)]'s Articles
[Chen,Wei]'s Articles
Baidu academic
Similar articles in Baidu academic
[Chen W(陈炜)]'s Articles
[Xu B(徐波)]'s Articles
[Chen,Wei]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Chen W(陈炜)]'s Articles
[Xu B(徐波)]'s Articles
[Chen,Wei]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 1.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.