汉语韵律节奏预测方法的研究

CASIA OpenIR > 毕业生 > 博士学位论文

	汉语韵律节奏预测方法的研究
其他题名	Research on Chinese Prosodic Structure Prediction Approaches
	刘方舟
	2009-05-29
学位类型	工学博士
中文摘要	节奏层级的预测是语音合成系统中必不可少的环节，它是生成静音、基频和时长等韵律参数的前提，其准确率很大程度上决定了合成语音的自然度甚至可懂度。本文采用三层结构——韵律词、韵律短语和语调短语来定义汉语节奏。在统计和分析了各级节奏单元的语法特性和长度分布规律的基础上，本文对比了多种统计机器学习模型在汉语节奏预测上的效果，选定了基于最大熵模型的预测框架。与算法相辅相成的是信息的使用，如何有效的利用和优化能够可靠获得的信息以提高节奏预测的精度，是本文的研究重点所在。具体来说，本文的主要工作包括以下几个方面： (1) 构建了大规模的节奏标注语料库。统计分析指出，浅层语法信息与低层节奏单元的对应关系比较明显，深层语法信息，无论是语法结构的层级高低还是短语类型，都不能为高层节奏单元的界定提供确切的信息。 (2) 对三个节奏层级分别进行了统计建模。根据语法词合并和分解生成韵律词的特点，韵律词预测的模型分为归并模型和分解模型。韵律短语和语调短语预测的模型既考虑了语法约束，也考虑了短语自身的长度分布规律。基于最大熵模型比较了多种特征选择方法，实验结果显示，只要保证了特征的统计稳定性，不同特征选择方法的效果相差不大。在语调短语预测中，本文尝试使用深层语法信息，但效果不明显。本文还提出了多种长度约束模型，细致的分析了长度信息对语调短语预测的贡献，得出了一些有趣的结论：人们在说话时，倾向于长短相间的停歇；节奏的规划是一种短时的局部规划；对短语长度独立建模能够有效地抑制错误传递，因此效果好于将长度信息作为统计分类模型的特征之一。 (3) 提出了自动调整词性集的策略。首先基于层次聚类的思想设计了词性缩减的迭代算法，并提出了度量词性相似度的向量空间模型和条件概率模型，对词性聚类算法进行了优化，大大缩短了词性聚类算法的收敛时间。然后又基于对数似然比来选择对短语边界最具区分能力的关键词，并设计了词性增补的贪心算法。实验结果显示，词性集的自动调整显著地提高了节奏预测模型的性能。 (4) 提出了两种自动生成特征模板的客观方法：基于决策树的规则转化法和基于层次聚类的贪心算法。前者将决策树叶子节点对应的规则转化为TBL算法的模板。实验结果显示，决策树模板既能较好地替代手工模板，也能为手工模板提供有益的补充。后者在每轮迭代中，都选择对预测结果改善最大的特征对合并成模板。特征合并算法在减少了人工参与的同时，显著地提高了语调短语预测的精度，并且还大大缩减了模板的数目。 (5) 提出了一个综合利用文本和声学信息的节奏自动标注方法。该方法采用基于最大熵模型的层级架构，使用不同的特征集对不同的节奏边界分别建模。对比实验显示，层级模型优于单层模型，声学特征在停延段边界的识别中表现突出，但对于识别韵律词边界不起作用，基频跃变对于检测韵律短语边界贡献较大，能量也能为语调短语边界的识别提供重要信息。本文针对特征优化所做的工作，包括特征选择、词性集调整、模板生成等，不仅可以改善汉语节奏预测的效果，也可以推广到自然语言处理的其它领域，具有一定的普...
英文摘要	Prosodic structure prediction plays an important role in text-to-speech system, it is a prerequisite for the generation of prosodic parameters, such as silence, fundamental frequency and duration, and its accuracy to a large extent determines both the naturalness and intelligibility of synthesized voice. This dissertation defines Chinese rhythm as a three-tier hierarchy consisting of prosodic word, prosodic phrase and intonation phrase. On the basis of detailed statistics and analysis of Chinese rhythm, this dissertation compares a variety of statistical machine learning models for predicting Chinese prosodic structure, and then selects the maximum entropy based framework. How to effectively optimize reliable information in order to improve the performance of prosodic structure prediction is the focus of this dissertation. In detail, the main work of this dissertation includes the following: (1) A large-scale rhythm-tagged corpus is constructed. Statistical analysis points out that there is tight correlation between the shallow syntax information and the low-level rhythm units, but the deep syntactic information, both the level of grammatical structure and the phrase type, can not provide precise information for the high-level rhythm units. (2) Three levels of rhythm units are statistically modeled respectively. According to the merger and decomposition of lexicon words to generate prosodic words, the prosodic word prediction model is divided into the merging model and splitting model. The prosodic phrase and intonation phrase prediction models not only consider the grammar constraints, but also take the phrase length distribution into account. Based on maximum entropy model, a variety of feature selection methods are compared. Experimental results show that, as long as features are statistically stable enough, different methods of feature selection have similar performance. Intonation phrase prediction in this dissertation try to use deep syntactic information, but the effect is not obvious. This dissertation also proposes a variety of length constraint model, analyzes the contribution of the length information to intonation phrase prediction in detail, and draw some interesting conclusions: people tend to alternate the interval length between pauses when they speak; the rhythm planning is a short-term local planning; independent modeling of phrase length can effectively inhibit the error transmission, so its performance is better than directly addi...
关键词	韵律节奏韵律词韵律短语语调短语 Prosodic Structure Prosodic Word Prosodic Phrase Intonation Phrase
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6178
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	刘方舟. 汉语韵律节奏预测方法的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2009.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20061801462806（759KB）			暂不开放	CC BY-NC-SA