CASIA OpenIR  > 毕业生  > 博士学位论文
嵌入式汉英双语混合语音识别技术的研究
其他题名Research of Embedded Speech Recognition Technology for Mixed Languages of Chinese and English
浦剑涛
学位类型工学博士
导师徐波
2008-05-31
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词嵌入式语音识别 非特定语言语音识别 模型参数共享 鲁棒性 Embedded Speech Recognition Language Independent Speech Recognition Sharing Of Acoustic Model Parameters Robust Speech Recognition
摘要本文系统地研究了汉语语音识别系统在嵌入式设备语音交互应用中的三个关键问题,即如何降低语音识别系统的计算和存储资源消耗、提高语音识别系统的鲁棒性、以及处理中国人说英语和汉英双语混合语音识别所面临的建模和搜索问题。 在降低语音识别系统的计算和存储资源消耗方面: 1. 研究了声学模型参数共享技术,提出了基于连续概率分布函数的TM-SDCHMM模型和基于离散概率分布函数的SDC-DHMM模型,在不降低模型精度或略微降低模型精度的情况下,减少了模型复杂度。 2. 通过简化声学得分计算,和基于在线路径可信度的高精度路径裁减,降低了搜索空间大小,提高了搜索解码的效率。 3. 针对定点处理器,提出了语音识别系统的定点化的数据表示、模型参数预运算和声学得分计算方案,提高了语音识别系统在定点处理器上的运行速度。 在提高语音识别系统的鲁棒性方面: 4. 在信号空间,提出了面向信号处理的语音综合预处理方法,适用于复杂的嵌入式语音应用环境,包括:异常信号检测和过滤、基于TMWF的语音增强和基于子空间能量和边缘检测滤波器的语音端点检测。 5. 在特征空间和模型空间,研究了特征规整、特征平滑和Multi-condition的声学模型训练方法。 6. 在系统应用层次,研究了多候选机制、基于后验概率和基于音素混淆度的可信度度量技术、基于自适应增益控制的背景噪声抑制和基于引导词语法的OOV拒识,提高了语音识别系统在实际应用环境下的鲁棒性。 在处理中国人说英语和汉英双语混合语音识别方面: 7. 在分析中国人的英语口音语音库的基础上,提出了扩展的英语声学建模单元,使得中国式英语也能够达到较高的识别率。 8. 在分析双语混合识别模型精度不匹配的问题的基础上,提出了通过手工调整模型精度均衡和混合建模自动均衡模型精度两种方法来解决这个问题。并针对汉英双语独立建模和混合建模的三音子声学模型,分别给出了相应的搜索框架。 本文的研究成果已经成功应用于语音拨号软件和不同的嵌入式设备、嵌入式操作系统、嵌入式微处理器中。
其他摘要This paper studied and resolved the three key issues encountered in the application of the Chinese speech recognition system in the intelligent terminal equipment, namely how to reduce the calculation and storage resource consumption, improve the robustness of voice recognition systems and to deal with the modeling and search problems faced by Chinese-accent English speech recognition and Chinese-English bilingual mix of speech recognition. To reduce the computing and storage resources consumption of Speech Recognition System: 1. On the acoustic model parameters sharing technology, we proposed TM-SDCHMM model based on continuous probability distribution function and SDC-DHMM model based on discrete probability distribution function. The model does not reduce the accuracy or slightly lower accuracy of the model and decreases the model complexity. 2. By simplifying acoustic scores, and high-precision path pruning based on the online confidence measure, we reduced the size of the search space and improved the efficiency of the decoder. 3. For fixed-point processor, we made a speech recognition system based on fixed data type and computation and model parameters pre-computing. To improve the robustness of speech recognition system: 4. In the signal space, we proposed a signal-processing oriented integrated speech pretreatment methods, applicable to the complex environment of embedded speech communication applications, including: abnormal signal detection and filtering, TMWF-based speech enhancement and the voice activity detection algorithm based on subspace energy and edge detection filters. 5. In the space of audio feature and acoustic model, we studied the features regularity, features smoothing and Multi-condition training methods of the acoustic model. 6. At the application level of the system, we studied the mechanism of multi-candidate, the posterior probability based and phone confusion based confidence measure, adaptive gain control based background noise suppression, and guidance phrase based OOV rejection, which improved the robustness of speech recognition system in the practical application environment. In dealing with the Chinese people speaking English and Chinese-English bilingual mixed speech recognition: 7. Based on analysis of the English Chinese-accented corpus, we proposed expansion of the English acoustic modeling unit, making Chinese-style English can also reach a higher recognition rate. 8. Based on the analysis of acoustic model mis-matching problem in the recognition of bilingual mixed speech, we proposed manual adjustments and mixed model method to balance the model precision of different language. In this paper, the research results have been successfully applied in voice dialing software and the different embedded devices, embedded operating systems, embedded microprocessor.
馆藏号XWLW1213
其他标识符200318014603020
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6104
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
浦剑涛. 嵌入式汉英双语混合语音识别技术的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2008.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20031801460302(1119KB) 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[浦剑涛]的文章
百度学术
百度学术中相似的文章
[浦剑涛]的文章
必应学术
必应学术中相似的文章
[浦剑涛]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。