汉语语音合成韵律预测技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	汉语语音合成韵律预测技术研究
其他题名	Research on Mandarin Text-to-Speech Synthesis Prosody Prediction Method
	车浩
	2015-05-26
学位类型	工学博士
中文摘要	汉语韵律预测是语音合成系统中必不可少的环节，它是生成静音、基频和时长等韵律参数的前提，其准确率很大程度上决定了合成语音的自然度甚至可懂度。本文的研究重点是如何提高韵律节奏和重音的预测准确率。在韵律节奏预测方面，本文首先统计和分析了语法特征和各级节奏单元分布之间的关系，然后将语法特征融合到节奏预测模型中进行了一系列尝试，证明了融入更多深层次的语法特征可以改善节奏预测模型性能。在上述实验基础上，本文对特征集合进行了优化，并通过实验证明新的特征集合可以进一步提高预测准确率。重音预测方面，本文重点研究了如何提高语篇环境下的重音预测准确率。针对语篇环境下的重音与单词信息量关系紧密但是难以统计量化的问题，提出了一种基于统计的单词信息量计算方法。通过实验证明，加入了包括单词信息量的全局特征可以改善重音预测模型的性能。具体来说，本文的主要工作包括以下几个方面： 1）探讨了如何利用更多深层次的语法特征改善汉语韵律节奏预测模型的性能。本文对大规模节奏标注语料库进行了统计和分析后发现，语法短语结构的浅层信息与低层节奏单元的对应关系比较明显；而依存关系的深层信息与高层节奏单元的关系更紧密。在随后的实验中，本文发现语法特征对韵律节奏预测模型的性能提高有帮助，但是仅依赖于语法特征的模型性能要差于仅依赖于传统的文本基础特征的模型性能。因此需要将文本基础特征和语法特征采用适当的组合方式才能提高韵律节奏预测模型的性能。对于各个韵律节奏单元而言，语调短语的预测模型在加入语法特征后提升明显，韵律短语和韵律词的预测模型则有小幅提高但不明显。同时，语调短语的预测优化更依赖于依存结构特征，而加入了语法短语结构特征以后反而会导致准确率下降。 2）验证了不同层级的韵律节奏单元预测更依赖于对应层级的语法特征。本文提出将语法特征按照层级划分为全局语法特征和局部语法特征。通过实验证明，加入全局语法特征以后的语调短语预测模型和加入局部语法特征的韵律词与韵律短语预测模型的性能均得到进一步的改善。 3）首先采用单句级别特征集合对语篇语料中的重音进行了预测实验。实验结果表明采用单句级别特征集合的重音预测模型对于高层级重音预测效果不佳。然后针对语篇环境下的高层级重音与单词信息量关系紧密但是难以统计量化的问题，提出了一种基于统计方法的单词信息量计算方法。通过实验证明，加入了包括单词信息量的全局特征可以改善语篇语料环境下的重音预测模型的整体性能。
英文摘要	Prosodic prediction plays an important role in text-to-speech system, it is a prerequisite for the generation of prosodic parameters, such as silence, fundamental frequency and duration, and its accuracy to a large extent determines both the naturalness and intelligibility of synthesized voice. In prosodic phrase prediction, this paper first analyses the relationship between syntactic features and each prosodic phrase units at first. Then we evaluate the effects of syntactic features in experiment. The experiment results show that the the proformance of prosodic phrase prediction model is improved by adding syntactic features into the features set. We also try to improve the prosodic phrase prediction model using new feature sets based on experience learned from former experiments. In stress prediction, this paper focus on prediction stress in discourse not only sentence. A statistical method designed to calculate the informativeness of word is proposed. The experiment results show the proformance of stress prediction model is improved by discourse features including word informativeness. In detail, the main work of this dissertation includes the following: 1）Investigate how to use syntactic features to improve the performance of prosodic phrase prediction model. First we analyses the corpus and find that the relationship between low level prosodic phrase and low level syntactic phrase structure is close. Meanwhile, the same phenomenon exists between high level prosodic phrase and high level dependency structure. The experimental results show that prediction models of prosodic word and prosodic phrase achieve the best performance with syntactic phrase and dependency features, while the models with dependency features outperform other models when predicting intonational phrase. 2）Evaluate the relationship between different level prosodic pharse untis and corresponding level syntactic features. We classify the features into global and local features sets. The experimental results show the performance of prosodic word and prosodic phrase prediction with local features in addition to the baseline features outperform other features combination. Meanwhile, the best result of the intonational phrase prediction is achieved when syntactic global features and baseline features are selected. The results show that the higher prosodic phrase boundaries prediction dependent on higher level syntactic features. 3）First we evulate the effect of sentence level feat...
关键词	韵律节奏节奏预测重音预测语法特征 Speech Prosody Prosodic Phrase Prediction Stress Prediction Syntactic Features
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6695
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	车浩. 汉语语音合成韵律预测技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2015.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20101801462802（1910KB）			暂不开放	CC BY-NC-SA