With the development of human interaction (HCI), speech synthesis technology is widely used, and people have more requirements for the text-to-speech (TTS) system. On one hand, people hope TTS system output frequently, clearly, normally high quality waveform; on the other hand, it is not allowed that TTS system occupy more computer cost and storage cost. So, an HMM based unit selection parameter joint speech synthesis framework is used in this paper. In this framework, we do some research work on source exciting and spectrum envelop of source-filter speech model, and parameter joint modification plan. The detailed work in this paper is as follows: Firstly, we analyze the exciting part of linear predication model and the spectrum envelop fitting of STRAIGHT model. There are many exciting method for linear predication model, we tried noise-pulse exciting model, LF exciting model, codebook exciting model and multi-band exciting model in this paper, and give the evaluation result of these exciting model. Besides, we adopt all-pole model and Gaussian mixture to fit the 513 dimensional spectrum parameters, so that the parameters are transformed to 24 dimensional coefficients. Then we evaluate the fitting results. Secondly, the acoustic parameters of speech units are commonly not continuous, and the prosody are sometimes not in accordance with the target prosody, so some prosody adjusts and acoustic parameter modification plans are proposed to solve these problems. Search the best joint position makes the bonder of speech unit more stable; Adjust the pitch and duration makes the prosody of speech unit be in accordance with the target prosody; Modify the bonder acoustic parameter of speech unit and sliding window smooth makes the neighbor speech units smoothly, continuous joint tighter. The spectrum formant structure is modified according to the human ear auditory perception model, which makes the synthesis speech more clearly. In the end, detailed acoustic framework design for such a TTS system is given. A 3.18 MB speech database with 29289 syllables is completed.
修改评论