基于时长相关状态转移HMM的汉语语音合成方法的研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于时长相关状态转移HMM的汉语语音合成方法的研究
其他题名	Research of Duration-Dependent State Transition HMM Based Chinese Speech Synthesis System
	陶静
	2009-05-30
学位类型	工学硕士
中文摘要	随着语音合成技术的发展，用户对语音合成效果也提出了更高的要求，尤其是多样化语音合成方面的要求。虽然现在的基于大语料库的波形拼接合成系统的效果不错，但是音库构建周期太长以及合成系统的可扩展性太差等缺陷都限制了大语料库合成系统在多样化语音合成方面的应用。而近年来提出的基于隐马尔可夫模型（Hidden Markov Model，HMM）的语音合成系统由于可以在短时间内，基本不需要人工干预的情况下自动构建一个合成系统，并且通过适当的调整HMM参数可以灵活的改变嗓音特性、发音风格以及情感。因此具有很高的理论研究意义和应用价值。对此，本文对NIT的HTS（HMM-based Speech Synthesis System，HTS）系统的技术框架、关键技术改进等方面进行了深入而系统的研究。本论文的主要研究工作如下： 1. 本文基于现有的模型训练方法和参数生成技术，搭建了一个完整的基于HMM的语音合成框架，包括一个自动化的训练流程和相应的合成后端。它可以根据用户的需求，通过一定的语音数据进行自动训练，快速形成一个相应的合成系统。并在此框架的基础上，构建了一个汉语的HTS系统。用户可以输入任意中文文本，此系统能够实时的输出合成语音。 2.由于传统的基于HMM的语音合成系统存在模型在训练阶段和合成阶段不一致的问题，因此NIT的研究者们将一个含有精确的时长概率分布的HMM—HSMM，引入系统的训练阶段和合成阶段，提出了基于HSMM的语音合成系统。本文构建了一个中文的基于HSMM的语音合成系统，验证了此种方法的有效性。 3.针对在基于HSMM的语音合成系统中，虽然HSMM模型的每个状态有精确的时长概率分布，但是状态转移概率却是时长无关的这一不一致问题，而且考虑到在模型训练中，大量的统计操作丢失了太多的细节信息，特别是时域变化信息。我们对HSMM模型进行改进，引入时长相关的状态转移概率，并提出改进的前向-后向算法，重新推导了参数重估公式，构建了一个基于DDHSMM（Duration-Dependent HSMM）的语音合成系统。使得合成语音的音质有所提高，节奏感更强。
英文摘要	With the development of speech synthesis technology, people have more requirements for the text-to-speech (TTS) system, especially the requirement for the diversification of synthetic speech. Although the current large-sized speech corpus based concatenative speech synthesis has good performance, its shortcoming of too long cycle to built speech corpus and poor expansibility, which limit the use of diversification. In recent years, HMM-based speech synthesis system (HTS) has been proposed, which can be automatically constructed in a short time without human intervention, and its voice characteristics, speaking style, or emotions can be controlled flexibly by transforming HMM parameters appropriately. So it has high research significance and application value. Therefore, this thesis studies the topic of HMM-based speech synthesis system in depth and systematically, including the framework construction, the key technology improvements. The main research works can be summarized as follows: 1. Based on the available HMM training method and parameter generation algorithm, the whole technique framework of HMM-based speech synthesis system is constructed, which include an automatic training procedure and a synthesis back-end. For the users’ requirement, a corresponding synthesis system can be quickly constructed under this framework by training with the input speech data. Moreover, based on this framework, we construct a Chinese HMM-based speech synthesis system. User input arbitrary text, this system can output the synthesized speech in real-time. 2. In the traditional HTS, there is an inconsistency: although the speech is synthesized from HMMs with explicit state duration probability distributions, HMMs are trained without them. So the NIT’s researchers introduce a hidden semi-Markov model (HSMM), and construct an HSMM-based speech synthesis system. To certificate the effect of this method, we re-derive parameter reestimation formulae and construct a Chinese HSMM-based speech synthesis system. 3. In HSMM-based speech synthesis system, there is still an inconsistency: although HSMM has explicit state duration probability distributions, the state transition probabilities are duration-invariant. And considering in the model training stage, too much detailed information, especially the timescale distortion at particular instant of an utterance, is missed by a lot of statistical processing. To resolve the problem, we introduce duration-dependent state transit...
关键词	语音合成隐马尔可夫模型时长相关状态转移概率 Ddhsmm 改进的前向-后向算法 Speech Synthesis Hmm Duration-dependent State Transition Probability Ddhsmm Improved Forward-backward Algorithm
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7479
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	陶静. 基于时长相关状态转移HMM的汉语语音合成方法的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2009.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20062801462804（748KB）			暂不开放	CC BY-NC-SA