CASIA OpenIR  > 毕业生  > 博士学位论文
面向语音识别的汉语声调研究
其他题名Tone Research for Chinese Speech Recognition
曹阳
学位类型工学博士
导师黄泰翼
2001-08-01
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词语音识别
摘要在现有的汉语语音识别系统中,声调信息并未得到充分利用,随着汉语语音识别 技术的进一步发展,声调研究成为识别技术突破的重要方向;在语音合成研究中,目 前合成自然度不高的一个重要因素是韵律规则的不完善,而汉语韵律规律的核心问题 就是声调规律。因此为了能在语音识别和合成中充分、有效的利用声调信息,必须对 连续语音中汉语声调的特点进行深入的研究。为此,本文从声调特征的提取、声调的 建模、连续语音中声调变化规律的获取以及连续语音中声调的识别这四个方面对连续 语音中汉语声调进行了深入研究。 基频曲线是汉语声调的最本质特征,因而基频的提取是声调研究的基础。由于常 见的基频提取算法,都只是利用了语音信号某一方面的特点,因而有的检测精度不 高、有的对噪音比较敏感,因此单一的算法很难解决基频提取问题。本文提出在动态 规划的框架下采用多种基频提取算法集成的新方法,充分利用语音信号多方面的特 点,从而克服单一算法的缺陷。基频提取对比实验结果证明了新算法无论在基频提取 准确性,还是在鲁棒性方面都优于单一的基频提取算法。 在声调的建模方面,首先我们研究了传统的隐马尔可夫声调模型,分析了隐马尔 可夫声调模型的不足。通过观察和分析大量的连续语音中声调的基频曲线,以及分析 语言学已有的关于声调变化模式的研究,我们认为可以用多项式曲线来表示基频曲 线,考虑到语音信号的随机性和时长对声调知觉的影响,我们提出了随机多项式曲 线(SP)声调模型。模型由基频曲线模型和时长模型组成。我们建立了模型的训练、识 别算法以及基于最小分类错误准则的模型参数优化算法。最后为了进一步增强模型的 描述能力,我们提出了混合随机多项式曲线声调模型,并建立基于EM算法的模型参 数估计算法。 连续语音中声调模式的变化规律是声调研究的重要方面,到目前为止,已有的研 究结果主要来源于传统语音学的定性观察、分析结果,并不能完全反映汉语声调变化 的实际规律。为了克服现有研究的缺点,本文提出利用决策树的数据驱动方法,结合 专家知识,从大规模语料中学习出连续语音中声调模式的分布以及影响这些分布的各 种因素。研究的结果得出了反映连续语音中汉语声调分布的规律的28种声调模式,同 时从结果分析中我们发现除了上下文声调,过去研究中忽略的因素如音节声韵母的发 音方式及部位、音节的位置等对声调模式变化也起了重要的作用。 最后本文进行了较大规模的声调识别实
其他摘要Tone information has not been utilized efficiently in current Chinese large vocabulary continuous speech recognition systems. With the further development of Chinese speech recognition, tone research becomes an important research direction. In current Chinese speech synthesis system, the imperfect prosody rules are one of the mainly factors to cause unsatisfied naturalness of synthesis speech. In order to efficiently and sufficiently utilize tone information in Chinese speech recognition and synthesis, deeply research for tone should be carried out. Therefore, in this paper, comprehensive study is developed for the following four parts: 1) tone feature extraction; 2) tone modelling; 3) tone pattern variation rules in continuous speech; 4) tone recognition. Common pitch extraction algorithms only use one aspect of speech signal, so they cannot solve pitch extraction perfectly. Some suffer lower precise, some lower robustness to noise. In order to overcome this shortcoming, a new method that combines several types of pitch extraction algorithms under the dynamic programming sketch is proposed in this paper. Compared with single pitch extraction algorithm, our method demonstrates good performance both in precise and robustness. We first study the traditional HMM tone model and analyze its limitation in tone modelling. By study the pitch contour in continuous Chinese speech, we find that pitch contour can be characterized by polynomial curve. Considering the stochastic property of speech signal and the variations among different speakers, we propose the stochastic polynomial tone model. The model consists of two parts: pitch contour model and duration model. In this model, pitch contour is described as a stochastic curve trajectory. The mean trajectory is parameterized by a polynomial function of normalized time while the variance is varied with time. Efficient training and recognition algorithms are developed. Parameters optimization algorithm based on MCE training is also proposed. At last, in order to enhance model characterization, we propose the mixture polynomial tone model and develop the parameters estimation algorithms based on EM algorithm. The analysis of tone pattern variations in continuous Chinese speech is very important for tone research. Because most current analysis of tone pattern variations derives from the qualitative observation result, the result is not appropriate for tone recognition and pitch curve generation. For the goal to conquer this deficiency, we investigate the tone pattern variation in continuous Chinese speech by stochastic cluster method. Since decision tree is a data-driven method that is easy to incorporate expert knowledge, we choose it as tool for our cluster method. While constructing the decision tree, besides neighboring tone, many other possible factors are considered such as syllable position in the word, Consonant/Vowel type of the syllable, which are not utilized i
馆藏号XWLW658
其他标识符658
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/5722
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
曹阳. 面向语音识别的汉语声调研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2001.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[曹阳]的文章
百度学术
百度学术中相似的文章
[曹阳]的文章
必应学术
必应学术中相似的文章
[曹阳]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。