|陈萧; 柯登峰; 徐波|
Punctuation generation is very important for automatic speech recognition. It greatly improves readability of transcripts and user experience, and facilitates following natural language processing tasks. In this paper, we develop a pure text information based method for punctuation generation for Chinese spoken sentence. The idea is that, first, modeling the relations between global lexical information and punctuation by different segment-level of sentence, then, combining these models using multi-layer perception, final, generating punctuation (period, question mark, exclamation mark). Results indicate that, compared with the baseline, the proposed method results in an 8.9% improvement in un-weighted accuracy and a 4.7% improvement in weighted accuracy. We achieve an un-weighted accuracy of 85.9% and a weighted accuracy of 92.2%. We study the effect on performance of the amount of training data. Results indicate that using larger training data sets consistently improves performance.
|Keyword||标点生成 全局信息 词汇信息 模型融合|
|Files in This Item:||Download All|
|2013NCMMSC.pdf（188KB）||会议论文||开放获取||CC BY-NC-SA||View Download|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|Similar articles in Baidu academic|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.