Speech synthesis technology plays an important role in many aspects of man-machine interaction. In order to benefit society, the synthesized speech quality should be as human-like as possible. Synthesized speech can be produced by several methods. All of these methods can be divided into two groups-articulatory synthesis, and concatenative synthesis. Concatenative synthesis has become more popular approach in recently years. As a part of concatenative systems, corpus-based speech synthesis has become very promising systems in achievement of a high naturalness synthesized speech until fairly recently. This is mainly because the memory and storage capacities of general-purpose computer have been enhanced. Corpus-based speech synthesis is similar to conventional concatenative synthesis, except that the inventory consists of a large corpus of labeled speech, and that, instead of modifying the stored speech to match the target prosody, the corpus is searched for speech phoneme sequences whose prosodic patterns match the target prosody. Most of my Master thesis is relative to this algorithm. Followings are list of my work: 1.Implemented a corpus-based speech synthesis. In order to make the synthesized speech more natural, some experiments have been carried out, such as using different concatenation unit, and using different segment tools. The results demonstrated that using a longer unit could achieve higher naturalness and less concatenation. 2.From the foregoing result, it was also shown that synthesis based on corpus had some difficult to find a proper concatenation unit by conventional method. So it was put forward a novel selection algorithm based on units match. Experiment proved that this solution can achieved an excellent synthesized speech in restricted domains. 3.Because a corpus cannot include concatenation units in a variety of prosodic contexts, the synthesized speech is not good enough in open domains. So a good algorithm was introduced to control the speech duration and pitch. Compared with the LPC and PSOLA algorithms, Sinusoidal model is more efficient and more convenient. The Sinusoidal algorithm has been implemented in this thesis. In addition, the application of the Sinusoidal Model to speech processing such as speech coding, time-scale modification and pitch-scale modification has been investigated. 4.In order to reduce the distortion in concatenation points, a method was presented to smoothing the signal near boundaries by using sinusoidal Model. Experiment proved that this method can reduced some distortion in the concatenation point. 5.In the end, it was investigated to some problems relative to the corpus and gave a tentative solution in the thesis.
修改评论