With the rapid development of communication science and computer science, Linguistic Engineering has got unprecedented cognizance and concern. Linguistic Engineering includes two main parts: Spoken Language Processing and Natural Language Processing. Generally speaking, it is to make computers obtain the ability to listen and understand the language of human, make some expected responses, and express them in speech. Speech Recognition, Natural Language Understanding and Speech Synthesis are the several core techniques. Our work is mainly on prosody research serving for Speech Recognition and Speech Synthesis. Speech Synthesizer, as the "mouth" in the Human-Computer communication, plays important role to those basic study on Speech Generation and Speech Perception. People bring forth requirement for more natural and manlike synthesis speech. But there is still a big gap between the requirements and what we have got. In general, there are two chief problems to resolve: one is to improve the prosodic model; the other is to improve the synthesizer, to make it realize the prosodic modification with high quality. Prosodic features are also called super-segment features, which include three main features pitch, power and duration. Prosody research is not only important to improve the speech synthesized, but also much helpful for Speech Recognition. Additionally, it is important to the study on emotion speech. Since the Speech Recognition and Speech Synthesis are becoming more and more practical, the problems on prosody are facing us directly. It is urgent to do some deep study on prosody now. Thus, we did the following works: (1). Develop a domain-relative speech synthesis system with high naturalness HNSU system. (2). Since prosody research is both helpful for Speech Synthesis and Speech Recognition, we did the following research work in this area: Some studies on prediction of prosodic breaks in Mandarin,and developed a CART-based model. Investigated the relationship between a syllable's duration and the duration of its consonant and final, and drew some conclusions. Meanwhile, built Line-Simulation function for it. Adopting Multi-Goss Mixture Simulation technique, we built the duration models for 1059 common syllables in Mandarin. Also, we normalized the speech corpus by speed of each syllable, and built duration models for 138 phonems. We proposed a new type of duration model--the relative duration model within each syllable. This model has obvious distinction for different syllables. It is potential for helping speech recognition. Additionally, we did some research work on accent of Mandarin. We developed a new system for stress autodetection based on tonal pitch range, and got good result. 3. Do some study on the analysis and auto-detection of emphasis in Mandarin, develop an emphasis auto-detection model in Mandarin based on tonal pitch range.
修改评论