CASIA OpenIR  > 模式识别国家重点实验室  > 自然语言处理
A Unified Model for Solving the OOV Problem of Chinese Word Segmentation
Li, Xiaoqing1; Zong, Chengqing1; Su, Keh-Yih2
Source PublicationACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
2015-06-01
Volume14Issue:3Pages:29
SubtypeArticle
AbstractThis article proposes a unified, character-based, generative model to incorporate additional resources for solving the out-of-vocabulary (OOV) problem of Chinese word segmentation, within which different types of additional information can be utilized independently in corresponding submodels. This article mainly addresses the following three types of OOV: unseen dictionary words, named entities, and suffix-derived words, none of which are handled well by current approaches. The results show that our approach can effectively improve the performance of the first two types with positive interaction in F-score. Additionally, we also analyze reason that suffix information is not helpful. After integrating the proposed generative model with the corresponding discriminative approach, our evaluation on various corpora-including SIGHAN-2005, CIPS-SIGHAN-2010, and the Chinese Treebank (CTB)-shows that our integrated approach achieves the best performance reported in the literature on all testing sets when additional information and resources are allowed.
KeywordAlgorithms Languages Experimentation Performance Chinese Word Segmentation Out-of-vocabulary Words Model Integration Domain Adaptation
WOS HeadingsScience & Technology ; Technology
DOI10.1145/2699940
Indexed BySCI
Language英语
Funding OrganizationNatural Science Foundation of China(61333018) ; International Science & Technology Cooperation Program of China(2014DFA11350) ; Hi-Tech Research and Development Program ("863" Program) of China(2012AA011102)
WOS Research AreaComputer Science
WOS SubjectComputer Science, Artificial Intelligence
WOS IDWOS:000370686400003
Citation statistics
Document Type期刊论文
Identifierhttp://ir.ia.ac.cn/handle/173211/11346
Collection模式识别国家重点实验室_自然语言处理
Corresponding AuthorLi, Xiaoqing
Affiliation1.Chinese Acad Sci, Inst Automat, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
2.Acad Sinica, Taipei, Taiwan
First Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Corresponding Author AffilicationInstitute of Automation, Chinese Academy of Sciences
Recommended Citation
GB/T 7714
Li, Xiaoqing,Zong, Chengqing,Su, Keh-Yih. A Unified Model for Solving the OOV Problem of Chinese Word Segmentation[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2015,14(3):29.
APA Li, Xiaoqing,Zong, Chengqing,&Su, Keh-Yih.(2015).A Unified Model for Solving the OOV Problem of Chinese Word Segmentation.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,14(3),29.
MLA Li, Xiaoqing,et al."A Unified Model for Solving the OOV Problem of Chinese Word Segmentation".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 14.3(2015):29.
Files in This Item: Download All
File Name/Size DocType Version Access License
TALIP-publication-xq(1172KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Xiaoqing]'s Articles
[Zong, Chengqing]'s Articles
[Su, Keh-Yih]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Xiaoqing]'s Articles
[Zong, Chengqing]'s Articles
[Su, Keh-Yih]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Xiaoqing]'s Articles
[Zong, Chengqing]'s Articles
[Su, Keh-Yih]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: TALIP-publication-xqli.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.