汉语非特定人听写机系统研究和集成

CASIA OpenIR > 毕业生 > 博士学位论文

	汉语非特定人听写机系统研究和集成
其他题名	Research and Integration of Speaker-independent Chinese Dictation System
	徐波
	1997-06-01
学位类型	工学博士
中文摘要	汉语大词汇量、非特定人和连续语音识别是语音识别中的一个难点，有关技术的突破，将打开一系列应用的大门。论文以该研究为目标，重点围绕建立一个高性能的听写机系统(包括孤立词及连续语音)而展开，特别是针对汉语语音特点的声学模型的建立、搜索识别算法以及最后实时系统的集成。作者根据汉语的特点，首先从音节信息熵的角度分析了各种不同知识源下的音节困惑度，指出了无论从语音层面还是从语言层面上考虑词汇表约束的重要性。同时分析了词库的树状表示、动态扩展方法和语言概率知识在树中的分布式表示的有效性。在此框架下，作者在孤立词识别(特定人和非特定人)的平台上比较了声韵母和音素两种建模单元和上下文有关的声学模型，这些模型包括音节内上下文有关模型、音节间上下文有关模型、声调有关声学模型及端点有关声学模型，得出了一些重要的结论。在此基础上，又分析研究了在一定训练数据的条件下，利用参数共享及平滑的概念，建立精确而又鲁棒的声学模型的两套方案：即基于全数据驱动的自下到上的聚类算法和基于语音学规则和数据驱动相结合的自顶到下的决策树算法。这两套方案试图为包含声调信息的汉语声学模型的自动建立提供一个完整的解决方案。论文还对连续密度HMM的一些训练要点及增强其建模能力的方法和途径进行了详尽的分析。在搜索算法上，作者结合汉语特点提出了一种基于汉语单音节结构的宽紧式高效启发式搜索算法，该算法不但使孤立词识别的搜索误差降为零，而且大大提高了识别速度，同时探讨了在实用中这个算法用于小词表系统集外词拒识的可能性．论文还对连续语音识别识别算法进行了讨论，特别是在二元统计语言模型条件下的两种词树动态扩展方法进行了细致的分析，并根据汉语同音词比较多的特点提出了一种邻词-时间双属性动态扩展方法。在产生的词图基础上，又加入三元统计模型知识进行A*搜索，产生最后句子。这些算法产生了比较满意的初步识别结果，字正确率达到70％左右，为进一步发展连续语音的识别打下了结实的基础。论文还描述了基于上述核心模型及搜索算法的一个完整的孤立词听写系统的各个模块及其集成。包括基于MFCC特征的预处理、基于CCBC谱变换的通用自适应算法、语言模型的建立及后处理、系统的实时实现等方面。这些技术使得最后系统的字正确率达到90％。
英文摘要	Large vocabulary, speaker-independent and continuous speech recognition appears to be one of the most difficult task in Automatic Speech Recognition(ASR). The breakthrough of the field will open up lots of speech-centric application in information technology systems. This research aims at the development of some key component technology particularly in acoustic model, search algorithm, as well as the final integrated system. The article analysis the syllable perplexity firstly under various knowledge sources and conclude the importance of lexicon constraint in acoustic and language level processing in Chinese speech recognition. In acoustic model research, two kinds of modeling unit are compared and four catagory context-dependent (inter and intra syllable, tone and endpoint) acoustic models are explored individually. Then two schemes of acoustic modeling methodology are proposed for the complete and uniform solution under current HMM engine based on the ideas of data sharing and parameter smoothing. One of the methods, we call knowledge and data hybrid driven decision tree methods, is taken as our final solution of the Chinese acoustic modeling which can integrate all of our valuable findings. For recognition algoirhtm, two-pass search algorithm based on the monosyllable structure of Chinese speech is proposed. The algorithm not only reduce the search error to zero but also accelerate the speed greatly. The thesis also point out the another use of algorithm for the rejection of out of vocabulary in lots of pratical application. The characteritics of two tree copy methods in continuous speech recognition, namely word conditioned and time conditioned, are discussed. And according to the fact of many existing hynonyms of Chinese language, these two tree-copy methods are combined into a new algorithm under the word bigram constraints. Taking the word-lattice as the module interface, A* algorithm is used for the final sentence search. The prelimary results show the effectiveness of the algorithm and character accuracy for speakerindependent continuous speech recognition reaches 70% averagely.. Finally, component modules of a isolated word dictation system are described. It includes the modules of frond-end processing based on MFCC and CMN, CCBC based adaptation, the building of statistical language model and its post-processing, the real time issues of the system etc. All technologies make it possible to reach 90% charater accuracy for unknown speech input channel.
关键词	语音识别声学模型搜索算法听写机自适应非特定人决策树 Speech Recognition Acoustic Model Search Algorithm Dictation System Adaptation Decision Tree Speaker-independent
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/5671
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	徐波. 汉语非特定人听写机系统研究和集成[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,1997.