面向语音识别的汉语声调研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向语音识别的汉语声调研究
其他题名	Tone Research for Chinese Speech Recognition
	曹阳
	2001-08-01
学位类型	工学博士
中文摘要	在现有的汉语语音识别系统中，声调信息并未得到充分利用，随着汉语语音识别技术的进一步发展，声调研究成为识别技术突破的重要方向；在语音合成研究中，目前合成自然度不高的一个重要因素是韵律规则的不完善，而汉语韵律规律的核心问题就是声调规律。因此为了能在语音识别和合成中充分、有效的利用声调信息，必须对连续语音中汉语声调的特点进行深入的研究。为此，本文从声调特征的提取、声调的建模、连续语音中声调变化规律的获取以及连续语音中声调的识别这四个方面对连续语音中汉语声调进行了深入研究。基频曲线是汉语声调的最本质特征，因而基频的提取是声调研究的基础。由于常见的基频提取算法，都只是利用了语音信号某一方面的特点，因而有的检测精度不高、有的对噪音比较敏感，因此单一的算法很难解决基频提取问题。本文提出在动态规划的框架下采用多种基频提取算法集成的新方法，充分利用语音信号多方面的特点，从而克服单一算法的缺陷。基频提取对比实验结果证明了新算法无论在基频提取准确性，还是在鲁棒性方面都优于单一的基频提取算法。在声调的建模方面，首先我们研究了传统的隐马尔可夫声调模型，分析了隐马尔可夫声调模型的不足。通过观察和分析大量的连续语音中声调的基频曲线，以及分析语言学已有的关于声调变化模式的研究，我们认为可以用多项式曲线来表示基频曲线，考虑到语音信号的随机性和时长对声调知觉的影响，我们提出了随机多项式曲线（SP）声调模型。模型由基频曲线模型和时长模型组成。我们建立了模型的训练、识别算法以及基于最小分类错误准则的模型参数优化算法。最后为了进一步增强模型的描述能力，我们提出了混合随机多项式曲线声调模型，并建立基于EM算法的模型参数估计算法。连续语音中声调模式的变化规律是声调研究的重要方面，到目前为止，已有的研究结果主要来源于传统语音学的定性观察、分析结果，并不能完全反映汉语声调变化的实际规律。为了克服现有研究的缺点，本文提出利用决策树的数据驱动方法，结合专家知识，从大规模语料中学习出连续语音中声调模式的分布以及影响这些分布的各种因素。研究的结果得出了反映连续语音中汉语声调分布的规律的28种声调模式，同时从结果分析中我们发现除了上下文声调，过去研究中忽略的因素如音节声韵母的发音方式及部位、音节的位置等对声调模式变化也起了重要的作用。最后本文进行了较大规模的声调识别实
英文摘要	Tone information has not been utilized efficiently in current Chinese large vocabulary continuous speech recognition systems. With the further development of Chinese speech recognition, tone research becomes an important research direction. In current Chinese speech synthesis system, the imperfect prosody rules are one of the mainly factors to cause unsatisfied naturalness of synthesis speech. In order to efficiently and sufficiently utilize tone information in Chinese speech recognition and synthesis, deeply research for tone should be carried out. Therefore, in this paper, comprehensive study is developed for the following four parts: 1) tone feature extraction; 2) tone modelling; 3) tone pattern variation rules in continuous speech; 4) tone recognition. Common pitch extraction algorithms only use one aspect of speech signal, so they cannot solve pitch extraction perfectly. Some suffer lower precise, some lower robustness to noise. In order to overcome this shortcoming, a new method that combines several types of pitch extraction algorithms under the dynamic programming sketch is proposed in this paper. Compared with single pitch extraction algorithm, our method demonstrates good performance both in precise and robustness. We first study the traditional HMM tone model and analyze its limitation in tone modelling. By study the pitch contour in continuous Chinese speech, we find that pitch contour can be characterized by polynomial curve. Considering the stochastic property of speech signal and the variations among different speakers, we propose the stochastic polynomial tone model. The model consists of two parts: pitch contour model and duration model. In this model, pitch contour is described as a stochastic curve trajectory. The mean trajectory is parameterized by a polynomial function of normalized time while the variance is varied with time. Efficient training and recognition algorithms are developed. Parameters optimization algorithm based on MCE training is also proposed. At last, in order to enhance model characterization, we propose the mixture polynomial tone model and develop the parameters estimation algorithms based on EM algorithm. The analysis of tone pattern variations in continuous Chinese speech is very important for tone research. Because most current analysis of tone pattern variations derives from the qualitative observation result, the result is not appropriate for tone recognition and pitch curve generation. For the goal to conquer this deficiency, we investigate the tone pattern variation in continuous Chinese speech by stochastic cluster method. Since decision tree is a data-driven method that is easy to incorporate expert knowledge, we choose it as tool for our cluster method. While constructing the decision tree, besides neighboring tone, many other possible factors are considered such as syllable position in the word, Consonant/Vowel type of the syllable, which are not utilized i
关键词	语音识别
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/5722
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	曹阳. 面向语音识别的汉语声调研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2001.