高自然度的统计参数语音合成方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	高自然度的统计参数语音合成方法研究
其他题名	Research on High Naturalness Statistical Parametric Speech Synthesis
	潘诗锋
	2011-05-29
学位类型	工学博士
中文摘要	统计参数语音合成具有输出语音平滑、连贯、鲁棒性高，系统构建快速、自动，可以灵活控制语音参数以及发音风格等优点，近年来在语音合成领域引起了极大的兴趣和重视，其中尤以基于隐马尔可夫模型(Hidden Markov Model, HMM)的语音合成为代表。现阶段基于HMM的语音合成的主要缺点是合成语音不够自然，主要体现在音质不够高和韵律过于平淡这两方面。本文研究的目标是高自然度的统计参数语音合成方法，在实现上以基于隐马尔可夫模型的语音合成为对象。本文具体研究工作和成果如下：在对基于HMM的语音合成核心方法和技术全面回顾的基础上，从HMM模型的准确度、语音参数生成和声码器合成这三个方面深入分析和总结了导致合成语音自然度下降的原因，从而为本文的研究工作提供了出发点。对以下HMM建模及模型训练设置，即HMM拓扑状态数、建模单元、训练语料量和基于最短描述长度（Minimum Description Length, MDL）的聚类因子，进行深入研究，得出一组具有指导意义的结论。研究中采用的HMM似然值、生成误差与主观评价相结合的评价方法适于进行模型准确度的评估。对结合整体方法的参数生成方法进行扩展。一是提出基于相邻阶LSP差分的整体方差模型以及结合该整体方差的语音参数生成算法。该方法能够更好地抑制生成的LSP参数的过平滑问题，提升合成语音的质量。二是进一步将结合整体方差的方法扩展到状态时长的生成中，提出结合整体方差的状态时长生成方法。该方法能够更好的抑制生成的状态时长的过平均问题，提升合成语音在时长分布方面的表现力和自然度。提出一种基于HMM的基元选取方法。该方法中采用基于分类回归树（Classification and Regression Tree，CART）的边界基频预测模型对相邻基元边界上的边界基频依赖关系进行建模，并在基元选取阶段以该模型指导基频拼接代价的计算。该方法使得基于HMM的基元选取合成方法在拼接代价的度量上更加准确，从而提高了基元边界基频的匹配程度以及整体的自然度。
英文摘要	Recently, statistical parametric speech synthesis has grown in popularity and been more and more interested for such advantages as the high stability and smoothing of synthetic voice, the rapid and automatic building of system, the flexible control of voice characteristics and speaking styles, etc. One representative instance of these techniques is Hidden Markov model (HMM)-based speech synthesis. Currently, the main drawback of HMM-based speech synthesis is the synthetic voice is not natural enough, including the unsatisfying speech quality and the flat prosody. This dissertation aims at reasarch on high naturalness statistical parametric speech synthesis, where the HMM-based speech synthesis method is adopted as one instance. The detailed research works and achievements are as follows: The HMM-based speech synthesis method is fully reviewd. From these three aspects, i.e. accuracy of HMM modeling, speech parameter generation and synthesis with vocoder, several key reasons for the naturalness degradation of synthetic speech is analyzed and discussed in depth, which gives the hints for the later research. Several basic factors which have influences on the naturalness of synthetic speech are studied in depth, including state number of HMM topology, basic unit for HMM modeling, size of training data and MDL (Minimum Description Length) factor. A set of useful conclusions are then drawn. The applied evaluation method consisting of HMM likelihood, generation error and subjective evaluation proved to be useful for the evaluation of HMM accuracy. The speech parameter generation method considering global variance (GV) is extended by two ways. One is that a global variance modeling on frequency domain delta LSP is proposed for HMM-based speech synthesis. A speech parameter generation algorithm considering this new global variance model is given in detail. With this method, the over-smoothing problem of generated spectral parameters is better allieviated and the naturalness of synthetic speech is improved. The other is that a state duration generation method considering global variance is proposed to allieviate the over-averaging problem of generated state duration. The synthetic speech is more natural and expressive with this method. An HMM-based unit selection (HUS) method is proposed. In this method a CART (Classification and Regression Tree) based boundary F0 dependency model is built to model the relationship between boundary F0s of adjacent units...
关键词	语音合成高自然度统计参数隐马尓可夫模整体方差基元选取 Speech Synthesis High Naturalness Statistical Parametric Speech Synthesis Hidden Markov Model Global Variance Unit Selection
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6369
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	潘诗锋. 高自然度的统计参数语音合成方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20081801462805（1498KB）			暂不开放	CC BY-NC-SA