CASIA OpenIR  > 数字内容技术与服务研究中心  > 听觉模型与认知计算
陈萧; 柯登峰; 徐波
Conference Name全国人机语音通讯学术会议
Source Publication第十二届全国人机语音通讯学术会议(NCMMSC'2013)论文集
Conference Date5-7
Conference Place贵州贵阳
Other Abstract
Punctuation generation is very important for automatic speech recognition. It greatly improves readability of transcripts and user experience, and facilitates following natural language processing tasks. In this paper, we develop a pure text information based method for punctuation generation for Chinese spoken sentence. The idea is that, first, modeling the relations between global lexical information and punctuation by different segment-level of sentence, then, combining these models using multi-layer perception, final, generating punctuation (period, question mark, exclamation mark). Results indicate that, compared with the baseline, the proposed method results in an 8.9% improvement in un-weighted accuracy and a 4.7% improvement in weighted accuracy. We achieve an un-weighted accuracy of 85.9% and a weighted accuracy of 92.2%. We study the effect on performance of the amount of training data. Results indicate that using larger training data sets consistently improves performance.

Keyword标点生成 全局信息 词汇信息 模型融合
Document Type会议论文
Corresponding Author徐波
Recommended Citation
GB/T 7714
陈萧,柯登峰,徐波. 基于全局词汇信息的中文口语句子标点生成[C],2013.
Files in This Item: Download All
File Name/Size DocType Version Access License
2013NCMMSC.pdf(188KB)会议论文 开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[陈萧]'s Articles
[柯登峰]'s Articles
[徐波]'s Articles
Baidu academic
Similar articles in Baidu academic
[陈萧]'s Articles
[柯登峰]'s Articles
[徐波]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[陈萧]'s Articles
[柯登峰]'s Articles
[徐波]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 2013NCMMSC.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.