Statistical parametric speech synthesis has caught a great of researchers’ attentions in recently years, especially the Hidden Markov Model (HMM)-based speech synthesis system (HTS). This system takes a lot of advantages, such as smooth and fluent synthesized speech, flexible modification of speech parameters, quickly system construction and a small footprint. However, this system is also suffered from some problems. One is the over-simplified vocoding technique which makes synthesized speech sounding in low quality and intelligibility. Another one is the over-smoothing trajectory of the generated speech’s parameters which makes the synthesized speech sounding in low intelligibility and naturalness. This paper will research on the parametric representation of speech for the HMM-based speech synthesis system. In the preparing stage, an excitation model and a parametric representation of speech are proposed. In the synthesizing stage, a technique is introduced in the speech parameter generation algorithm. In detail, the dissertation includes: A harmonic plus noise mixed excitation model is described. Residual signal is obtained from speech signal by inverse filtering and its spectrum can be split into a low-frequency harmonic region and a high-frequency noise region by the Maximum Voicing Frequency (MVF). A new MVF calculation method based on K-means algorithm is proposed. The spectrum of residual signal is split into sub-bands which are clustered into two classes and a Viterbi algorithm is used to search a smoothed MVF contour. This model is introduced into the HMM-based speech synthesis system and MVF is treated as an independent parametric stream in the training stage. In the synthesizing stage, the excitation signal is generated by a sum of a number of harmonicially related sinusoids and a high-passed white Gaussian noise. The experiments results show the proposed excitation model could reduce the buzz sounding problem and sound better than the pulse train excitation model. A parametric representation of speech based on spectral reconstruction of the residual signal is proposed. The spectrum of the residual signal not only shows a noise structure in the high-frequency region, but also researves some detailed harmonic structures which have not been included in the linear prediction (LP) spectrum. The proposed technique is based on the pitch-scaled analysis which could easily extract the detailed harmonic structure of the residual signal and the pitc...
修改评论