Knowledge Commons of Institute of Automation,CAS
Integrating Multi-source Bilingual Information for Chinese Word Segmentation in Statistical Machine Translation | |
Chen W(陈炜); Wei W(韦玮); Chen ZB(陈振标); Xu B(徐波); Chen,Wei | |
2013-10 | |
会议名称 | Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data.(CCL) |
会议录名称 | Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data.(CCL) |
会议日期 | 2013-10 |
会议地点 | Suzhou,China |
摘要 | Chinese texts are written without spaces between the words, which is problematic for Chinese-English statistical machine translation (SMT). The most widely used approach in existing SMT systems is apply a fixed segmentations produced by the off-the-shelf Chinese word segmentation (CWS) systems to train the standard translation model. Such approach is sub-optimal and unsuitable for SMT systems. We propose a joint model to integrate the multi-source bilingual information to optimize the segmentations in SMT. We also propose an unsupervised algorithm to improve the quality of the joint model iteratively. Experiments show that our method improve both segmentation and translation performance in different data environment. |
关键词 | Chinese Word Segmentation Bilingual Information Statistical Machine Translation |
收录类别 | EI |
文献类型 | 会议论文 |
条目标识符 | http://ir.ia.ac.cn/handle/173211/41230 |
专题 | 复杂系统认知与决策实验室_听觉模型与认知计算 数字内容技术与服务研究中心 |
通讯作者 | Chen,Wei |
推荐引用方式 GB/T 7714 | Chen W,Wei W,Chen ZB,et al. Integrating Multi-source Bilingual Information for Chinese Word Segmentation in Statistical Machine Translation[C],2013. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论