汉语广播电视新闻语音识别

CASIA OpenIR > 毕业生 > 博士学位论文

	汉语广播电视新闻语音识别
其他题名	TRANSCRIPTION OF CHINESE BROADCAST NEWS OVER RADIO AND TV
	贾磊
	2003-07-01
学位类型	工学博士
中文摘要	目前，以汉语广播电视新闻语音为对象的语音识别技术的研究，对于语音识别的实用化发展具有非常重要的意义。本文针对汉语广播电视新闻语音识别的技术难点，主要进行了以下几个方面广泛而深入的研究。首先：在广播电视新闻语音的连续语音分割方面，本文提出基于检测熵变换趋势的音频特征跳变点检测方法，用来检测连续音频信号中的声学特征发生改变的地方。这种方法是根据一段数据窗内的每一个可能的声学特征跳变点所分割的两段语音信号的熵的变化趋势来确定声学特征跳变点。相比于国际上通用的基于BIC准则的声学特征跳变点检测方法，本文提出的音频特征跳变点检测方法具有较高的灵敏度和较鲁棒的检测门限，可以较好的适应各种场合的声学特征跳变点的检测。其次：在广播电视新闻语音识别的自适应方面，本文详细分析了国际上通用的各种广播电视新闻识别系统的自适应算法的优点和缺点。特别针对基于自适应回归树的MLLR算法需要依靠先验知识来决定自适应变换类的这一缺点和不足，提出一种基于目标驱动的多层自适应算法。这种自适应算法能够根据自适应数据的似然概率的增加来动态的决定自适应变换类的种类和数目，可以更加充分的利用有限的自适应数据进行自适应，提高系统的识别率。最后：在广播电视新闻的连续语音识别方面，本文基于现有的广播电视新闻的连续语音识别系统在处理相关性特征建模这个问题上的缺点和不足，结合特征层的线性旋转变换和模型层的方差建模技术，提出一种用方差建模技术来实现的共享状态空间旋转变换矩阵的相关特征建模方法。这种方法利用基于状态的旋转变换方法的解相关作用，在变换后的不相关的特征空间上建立高精度的具有对角方差结构的混合高斯模型。同时又利用方差建模技术来进行状态空间变换矩阵的参数共享和参数优化，克服了基于状态的特征旋转变换方法所导致的模型参数数目过多、解码时计算量较大的缺点。在汉语普通话连续语音测试和广播语音的连续语音测试中，本文提出的相关性特征建模方法在增加较少的内存占用量和解码计算量的情况下，能够比采用传统的具有对角方差结构的高斯混合模型的建模方法获得20%的相对误识率的降低。
英文摘要	In the past ten years, great progress has been made in the state-of-the-art laboratory speech-recognition system. Recently the focus of speech research has shifted from read speech to the speech data found in the real world-like broadcast news over radio and TV. During the three years of my Ph.D. study, I have investigated the key technologies of building broadcasting recognition system. The main research work focused on the following three aspects: I proposed a novel method for acoustic change point detection, which is important for the improvement of performance of broadcasting segmentation system. The method proposed here detects the acoustic change points by checking the changing trend of dividing entropy of every signal points in a sliding window. Compared with the traditional detection method based on Bayesian Information Criterion (BIC), the method can detect the acoustic change point more accurately, especially for that between two short signals. The MLLR adaptation method has been widely used in the speech recognition system. The traditional MLLR adaptation method defines the regression classes based on the assumption that all the output distributions close in original acoustic feature space should be tied and transformed together, which may not be valid in some cases. In order to overcome the drawback of the assumption, I proposed a target-driven MLLR adaptation algorithm with multiple layer structure, in which the regression classes is defined in order to have the maximizing increase of the likelihood of the adaptation data. In comparison with the traditional MLLR adaptation method, the new algorithm gives about 10% relative error reduction and causes less computation load. Continuous speech recognition technology is the most important technology in the broadcasting recognition system. A method based on feature space transform is proposed to model correlations between feature coefficients. In the method, state-specified rotation (SSR) transform generates refined multiple mixture diagonal gaussian models first by rotating the feature vectors in each state to an uncorrelated new feature space. Because the acoustic model generated by SSR method has much computation load during decoding, a tying method using the optimization strategy of semi-tied covariance transform (STC) is proposed to tie the feature-space transform matrix among different states. Experiments on LVCSR test showed that the method can achieve nearly 20% relative error reductions compared to the traditional diagonal gaussian modeling method and cause less computation cost during decoding.
关键词	广播电视新闻语音识别连续音频分割说话人自适应特征空间旋转变换方差建模技术 Broadcasting Speech Recognition Broadcasting Speech Segmentation Adaptation State-specific Rotation Semi-tied Covariance Modelin
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/5778
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	贾磊. 汉语广播电视新闻语音识别[D]. 中国科学院自动化研究所. 中国科学院研究生院,2003.