Institutional Repository of Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
A Unified Model for Solving the OOV Problem of Chinese Word Segmentation | |
Li, Xiaoqing; Zong, Chengqing; Su, Keh-Yih | |
发表期刊 | ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING |
2015-06-01 | |
卷号 | 14期号:3页码:29 |
文章类型 | Article |
摘要 | This article proposes a unified, character-based, generative model to incorporate additional resources for solving the out-of-vocabulary (OOV) problem of Chinese word segmentation, within which different types of additional information can be utilized independently in corresponding submodels. This article mainly addresses the following three types of OOV: unseen dictionary words, named entities, and suffix-derived words, none of which are handled well by current approaches. The results show that our approach can effectively improve the performance of the first two types with positive interaction in F-score. Additionally, we also analyze reason that suffix information is not helpful. After integrating the proposed generative model with the corresponding discriminative approach, our evaluation on various corpora-including SIGHAN-2005, CIPS-SIGHAN-2010, and the Chinese Treebank (CTB)-shows that our integrated approach achieves the best performance reported in the literature on all testing sets when additional information and resources are allowed. |
关键词 | Algorithms Languages Experimentation Performance Chinese Word Segmentation Out-of-vocabulary Words Model Integration Domain Adaptation |
WOS标题词 | Science & Technology ; Technology |
DOI | 10.1145/2699940 |
收录类别 | SCI |
语种 | 英语 |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Artificial Intelligence |
WOS记录号 | WOS:000370686400003 |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/40852 |
专题 | 模式识别国家重点实验室_自然语言处理 |
通讯作者 | Li, Xiaoqing |
推荐引用方式 GB/T 7714 | Li, Xiaoqing,Zong, Chengqing,Su, Keh-Yih. A Unified Model for Solving the OOV Problem of Chinese Word Segmentation[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2015,14(3):29. |
APA | Li, Xiaoqing,Zong, Chengqing,&Su, Keh-Yih.(2015).A Unified Model for Solving the OOV Problem of Chinese Word Segmentation.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,14(3),29. |
MLA | Li, Xiaoqing,et al."A Unified Model for Solving the OOV Problem of Chinese Word Segmentation".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 14.3(2015):29. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论