Tone information has not been utilized efficiently in current Chinese large vocabulary continuous speech recognition systems. With the further development of Chinese speech recognition, tone research becomes an important research direction. In current Chinese speech synthesis system, the imperfect prosody rules are one of the mainly factors to cause unsatisfied naturalness of synthesis speech. In order to efficiently and sufficiently utilize tone information in Chinese speech recognition and synthesis, deeply research for tone should be carried out. Therefore, in this paper, comprehensive study is developed for the following four parts: 1) tone feature extraction; 2) tone modelling; 3) tone pattern variation rules in continuous speech; 4) tone recognition. Common pitch extraction algorithms only use one aspect of speech signal, so they cannot solve pitch extraction perfectly. Some suffer lower precise, some lower robustness to noise. In order to overcome this shortcoming, a new method that combines several types of pitch extraction algorithms under the dynamic programming sketch is proposed in this paper. Compared with single pitch extraction algorithm, our method demonstrates good performance both in precise and robustness. We first study the traditional HMM tone model and analyze its limitation in tone modelling. By study the pitch contour in continuous Chinese speech, we find that pitch contour can be characterized by polynomial curve. Considering the stochastic property of speech signal and the variations among different speakers, we propose the stochastic polynomial tone model. The model consists of two parts: pitch contour model and duration model. In this model, pitch contour is described as a stochastic curve trajectory. The mean trajectory is parameterized by a polynomial function of normalized time while the variance is varied with time. Efficient training and recognition algorithms are developed. Parameters optimization algorithm based on MCE training is also proposed. At last, in order to enhance model characterization, we propose the mixture polynomial tone model and develop the parameters estimation algorithms based on EM algorithm. The analysis of tone pattern variations in continuous Chinese speech is very important for tone research. Because most current analysis of tone pattern variations derives from the qualitative observation result, the result is not appropriate for tone recognition and pitch curve generation. For the goal to conquer this deficiency, we investigate the tone pattern variation in continuous Chinese speech by stochastic cluster method. Since decision tree is a data-driven method that is easy to incorporate expert knowledge, we choose it as tool for our cluster method. While constructing the decision tree, besides neighboring tone, many other possible factors are considered such as syllable position in the word, Consonant/Vowel type of the syllable, which are not utilized i
修改评论