Currently, most Text-To-Speech systems can only synthesize speech in a single style, which greatly limited the application of TTS system. For improving the expressiveness of the TTS outputs and enlarging the application of TTS system, this paper tries to study the prosody and spectrum modeling in personalized speech synthesis. Focusing on the prosodic and spectral style in personalized speech, this paper studies the prosody model based on mutual constraint, the prosody adaptation model in speech synthesis system, the dialog prosody model and the parametric speech synthesis system based on combined HMMs. The achievements of this paper are as follows: (1) The prosody model based on mutual constraint. This paper proposed and verified that there are strong mutual prosodic constraints between adjacent syllables in reading Mandarin speech. Based on these constraints, this paper presents a new definition of concatenation cost, which can precisely depict the naturalness between adjacent syllables. By minimizing the concatenation cost in the overall sentence, the pitch model can generate much more natural pitch contour. (2) The prosody adaptation in concatenation speech synthesis system. This paper presents a prosody adaptation method which is able to adapt the prosody model to a new style with a small training corpus. Based on one or several source corpuses, the new adapted prosody model has not only the target speaker’s prosody characteristics, but also complete coverage of contextual information of the source speaker. (3) The dialog prosody model. This paper presented a dialog prosody model. For complete that mission, the key point is to find the major difference between dialog pitch contour and read pitch contour. Based on many analysis and observations, this paper concluded that a major difference between dialog pitch contour and read pitch contour is the existence of the incomplete phenomenon. By simulating that phenomenon, the prosody model can output pitch contours with dialog style. (4) The parametric speech synthesis system based on combined HMMs. The HMM-based TTS system is a paramedic system which is presented recently. Although its high flexibility and low memory requirement, the speech quality of that system is not very well. To resolve that problem, this paper presents a combined HTS system which makes uses of both discrete HMMs and continuous HMMs. That system can resolve the over-smoothing problem in frequency domain and time domain which is encountered by conventional HTS system.
修改评论