基于语料库汉语语音拼接合成方法研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于语料库汉语语音拼接合成方法研究
	祝韶晖
	2002-05-01
学位类型	工学硕士
中文摘要	在人机交互的许多方面，语音合成扮演了一个非常重要的角色。为了使语音合成技术为更好地服务于社会，提高语音合成的质量成为当务之急。当前存在很多种语音合成技术，总的来说这些方法可以分成两类：发音器官参数合成法和拼接合成。相对于发音器官参数合成法，目前拼接合成方法更为流行。作为拼接合成方法中的一种，基于语料库的合成方法当前被认为是一种非常有潜力的方法。这主要归功于当前计算机内存、存储能力的提高。基于语料库的合成方法和传统的基于拼接的方法非常类似，不过不同于传统的基于拼接的方法，它采用了一个大的语音库，这个库包括合成基元的多个样本。而且有别于对合成基元进行韵律调节，它是通过选择合适的样本来满足韵律要求的。本论文工作的重点就是研究基于语料库的拼接方法，本文开展的工作如下： 1．实现了基于语料库的基本合成算法。为使合成语音更加自然，作者做了一系列的实验，包括采用不同的合成基元、不同的切分工具。通过这些实验，发现采用较大的合成基元能够合成出更自然的语音的同时还能减少拼接点。 2．从上面的实验结果还发现，这种算法在选择合成基元方面不是非常令人满意的。于是，本文提出了一种基于匹配的基元选择算法。实验证明这种方法用于限定领域合成时，能取得非常好的效果。 3．由于语音库中不可能包括所有基元在各种语境下的样本，在非限定领域情况下，这种方法合成效果一般。为此，本文试图选用一种好的语音时长、音高的调整算法。在对比了LPC、和PSOLA方法的基础上，本文选择了Sinusoidal Model。该模型是一种更为高效、便捷的方法。本文实现了该算法，并用这一算法在语音处理方面进行了一系列实验，其中包括语音编码、语音时长和音高的调整。 4．为了减少在拼接点处的语音失真和突变，本文实验了sjnusoidal Model来平滑拼接点处的语音。结果表明，该方法能够减少语音中类似于咔嗒声的数量。 5．最后，本文在语料库的建设方面提出了一尝试性的解决方案。
英文摘要	Speech synthesis technology plays an important role in many aspects of man-machine interaction. In order to benefit society, the synthesized speech quality should be as human-like as possible. Synthesized speech can be produced by several methods. All of these methods can be divided into two groups-articulatory synthesis, and concatenative synthesis. Concatenative synthesis has become more popular approach in recently years. As a part of concatenative systems, corpus-based speech synthesis has become very promising systems in achievement of a high naturalness synthesized speech until fairly recently. This is mainly because the memory and storage capacities of general-purpose computer have been enhanced. Corpus-based speech synthesis is similar to conventional concatenative synthesis, except that the inventory consists of a large corpus of labeled speech, and that, instead of modifying the stored speech to match the target prosody, the corpus is searched for speech phoneme sequences whose prosodic patterns match the target prosody. Most of my Master thesis is relative to this algorithm. Followings are list of my work: 1.Implemented a corpus-based speech synthesis. In order to make the synthesized speech more natural, some experiments have been carried out, such as using different concatenation unit, and using different segment tools. The results demonstrated that using a longer unit could achieve higher naturalness and less concatenation. 2.From the foregoing result, it was also shown that synthesis based on corpus had some difficult to find a proper concatenation unit by conventional method. So it was put forward a novel selection algorithm based on units match. Experiment proved that this solution can achieved an excellent synthesized speech in restricted domains. 3.Because a corpus cannot include concatenation units in a variety of prosodic contexts, the synthesized speech is not good enough in open domains. So a good algorithm was introduced to control the speech duration and pitch. Compared with the LPC and PSOLA algorithms, Sinusoidal model is more efficient and more convenient. The Sinusoidal algorithm has been implemented in this thesis. In addition, the application of the Sinusoidal Model to speech processing such as speech coding, time-scale modification and pitch-scale modification has been investigated. 4.In order to reduce the distortion in concatenation points, a method was presented to smoothing the signal near boundaries by using sinusoidal Model. Experiment proved that this method can reduced some distortion in the concatenation point. 5.In the end, it was investigated to some problems relative to the corpus and gave a tentative solution in the thesis.
关键词	基于语料库的合成基于匹配的选择算法 Lpc Psola Sinusoidal Model 时长调整音高调整 Corpus-based Speech Synthesis SelectiOn Algorithm Based On Units Match Lpc Psola Sinusoidal Model Time-scale Modification Pitch-
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6782
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	祝韶晖. 基于语料库汉语语音拼接合成方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2002.