基于音素识别的语种识别技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于音素识别的语种识别技术研究
其他题名	Research on PPRLM-based Language Identification Technology
	王士进
	2008-05-20
学位类型	工学博士
中文摘要	语种识别技术就是计算机能够自动识别出语音所属语言种类的过程，在多语言语音处理、语音自动翻译、安全监控等领域发挥着越来越重要的作用。本文基于音素识别的语种辨识，在音素建模、语言建模、系统融合等方面进行了相关的研究。论文工作的主要内容和贡献如下： (1) 研究实现了基于PPRLM的语种识别基线系统，并研究了模型平滑、信道差异、说话方式对识别性能的影响，使得基线系统性能提高到77.81%。 (2) 将基于NN-HMM混合模型的音素识别引入了语种识别，系统性能提升超过10%。在此基础上，研究了多种自动聚类算法，提出一个Multilingual声学模型建模的方法，使得Multilingual PRLM系统获得了跟PPRLM系统可比的识别正确率；同时经过与PPRLM系统融合，系统性能又提升约2%。 (3) 提出了基于决策树的语言模型和随机决策树的语言模型，使得语种识别系统的性能提高约6%；同时针对包含更多信息的词图，又提出了基于词图的区分度语言模型建模，使得识别性能提高约8%。 (4) 研究实现了多个基于声学特征的语种识别系统和基于LDA、Gaussian的系统融合方法。通过系统融合，基于声学特征的语种识别系统对PPRLM系统起到了较大的补充作用，在NIST 2003年30秒语种测试集上，系统融合后准确率达到98.75%，接近或超过近年来国际上主流的评测系统。
英文摘要	Multilingual language identification (LID) is a procedure of identifying the language corresponding to the certain speech segment; it plays an increasingly important role in speech information services, multilingual speech translation, and security surveillance. This paper presents the recent progress obtained in the research on multilingual LID technology including acoustic modeling, language modeling, and system combination. Firstly, we build a baseline PPRLM system, and study the effects of language model smoothing, speech channel, speaking style, with a baseline recognition accuracy of 77.81% obtained. Secondly, we incorporate NN-HMM based acoustic modeling to LID, which can achieve about 10% improvement; furthermore, we study several clustering algorithms, and propose an algorithm to build multilingual acoustic model, which gets comparable accuracy with PPRLM system. After combination with PPRLM system, the performance achieves about 2% improvement. Thirdly, we propose a method of binary-decision tree language modeling and random forest based binary-decision tree language model in PPRLM, which achieves about 6% improvement; and then we propose lattice-based SVM language modeling, which achieves about 8% improvement. Finally we integrate the techniques of acoustic language identification algorithms and LDA-Gaussian based system combination algorithm. Our system achieves a recognition accuracy of 98.75% after combination of NIST 2003 30s LRE data, the final performance is also comparable to the recent LID systems in the world.
关键词	多语言语种识别 Nn-hmm混合模型 Multilingual声学模型决策树语言模型随机决策树语言模型词图系统融合 Multilingual Language Identification Nn-hmm Acoustic Model Multilingual Acoustic Model Binary-decision Tree Language Model Random Forest Based Binary-decision Tree Language Word Lattice System Combination Prlm Pprlm
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6057
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王士进. 基于音素识别的语种识别技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2008.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20051801462808（996KB）			暂不开放	CC BY-NC-SA