CASIA OpenIR  > 模式识别国家重点实验室  > 自然语言处理
A Unified Model for Solving the OOV Problem of Chinese Word Segmentation
Li, Xiaoqing1; Zong, Chengqing1; Su, Keh-Yih2
2015-06-01
发表期刊ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
卷号14期号:3页码:29
文章类型Article
摘要This article proposes a unified, character-based, generative model to incorporate additional resources for solving the out-of-vocabulary (OOV) problem of Chinese word segmentation, within which different types of additional information can be utilized independently in corresponding submodels. This article mainly addresses the following three types of OOV: unseen dictionary words, named entities, and suffix-derived words, none of which are handled well by current approaches. The results show that our approach can effectively improve the performance of the first two types with positive interaction in F-score. Additionally, we also analyze reason that suffix information is not helpful. After integrating the proposed generative model with the corresponding discriminative approach, our evaluation on various corpora-including SIGHAN-2005, CIPS-SIGHAN-2010, and the Chinese Treebank (CTB)-shows that our integrated approach achieves the best performance reported in the literature on all testing sets when additional information and resources are allowed.
关键词Algorithms Languages Experimentation Performance Chinese Word Segmentation Out-of-vocabulary Words Model Integration Domain Adaptation
WOS标题词Science & Technology ; Technology
DOI10.1145/2699940
收录类别SCI
语种英语
项目资助者Natural Science Foundation of China(61333018) ; International Science & Technology Cooperation Program of China(2014DFA11350) ; Hi-Tech Research and Development Program ("863" Program) of China(2012AA011102)
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence
WOS记录号WOS:000370686400003
引用统计
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/11346
专题模式识别国家重点实验室_自然语言处理
通讯作者Li, Xiaoqing
作者单位1.Chinese Acad Sci, Inst Automat, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
2.Acad Sinica, Taipei, Taiwan
推荐引用方式
GB/T 7714
Li, Xiaoqing,Zong, Chengqing,Su, Keh-Yih. A Unified Model for Solving the OOV Problem of Chinese Word Segmentation[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,2015,14(3):29.
APA Li, Xiaoqing,Zong, Chengqing,&Su, Keh-Yih.(2015).A Unified Model for Solving the OOV Problem of Chinese Word Segmentation.ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING,14(3),29.
MLA Li, Xiaoqing,et al."A Unified Model for Solving the OOV Problem of Chinese Word Segmentation".ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 14.3(2015):29.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
TALIP-publication-xq(1172KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Li, Xiaoqing]的文章
[Zong, Chengqing]的文章
[Su, Keh-Yih]的文章
百度学术
百度学术中相似的文章
[Li, Xiaoqing]的文章
[Zong, Chengqing]的文章
[Su, Keh-Yih]的文章
必应学术
必应学术中相似的文章
[Li, Xiaoqing]的文章
[Zong, Chengqing]的文章
[Su, Keh-Yih]的文章
相关权益政策
暂无数据
收藏/分享
文件名: TALIP-publication-xqli.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。