汉语普通话语音合成及韵律研究

CASIA OpenIR > 毕业生 > 硕士学位论文

	汉语普通话语音合成及韵律研究
	慎熙鹏
	2001-06-01
学位类型	工学硕士
中文摘要	在信息科学和计算机科学飞速发展的今天，言语工程技术的发展受到全世界的普遍关注。言语工程技术主要包括口语信息处理及自然语言处理两个部分。总的说来，其任务就是要让机器听懂、理解人类的语言，并且能够做出理想的反应，通过同样的言语表达出来。语音识别技术、自然语言理解以及语音合成技术是其核心的几项技术。本文中的工作主要围绕以语音合成和辅助语音识别为目的的韵律研究而进行的。语音合成技术在人机通讯中充当着“嘴”的作用，同时对语音的产生和感知模型等基础研究也有着十分重要的意义。人们希望机器发出的声音能更自然、更贴近人的声音，而目前汉语语音合成系统与这一要求还相差较远。一般认为提高合成自然度主要要解决两个问题：一是提高韵律模块质量；二是提高合成器质量，即在有了韵律参数之后，合成器如何高质地实现这些韵律。韵律特征也称超音段特征，其内容包括言语中除了音色之外的其余三个特征——音高、音强和音长。韵律特征不仅对于提高语音合成的自然度有着重要的作用，而且可以对语音识别产生辅助作用、提高语音识别的正确率和速度。此外，韵律特征与感情语音的研究密切相关。在语音识别、语音合成逐步进入实用化阶段的今天，韵律特征已经成为一个不能回避的问题，对于韵律特征的深入研究，成为当前口语处理的客观需要。正是基于以上需要，作者进行了如下几个方面的研究工作：（1）在语音合成方面，作者研究比较了各种语音合成方法的特点，在汉语具体特点基础上，采用波形拼接的方法，建立了一个高自然度旅游信息语音合成系统（2）韵律控制对于语音识别和语音合成都有着重要的作用，作者在韵律方面进行了如下工作。 ◆ 针对语音合成，作者进行了韵律停顿方面的研究。根据汉语特点，作者建立了三棵决策树，对汉语韵律停顿进行等级式预测。取得了较好的预测效果。 ◆ 作者还对汉语时长进行了深入研究，对于音节时长与其声、韵母时长的变化关系进行详细的统计分析，得出与汉语已有的某些结论不同的新结论；同时，利用直线拟合的方法，建立了该关系的拟合函数。 ◆作者采用多维高斯混合拟合的方法，在说话人语速归一化的基础上，建立了汉语常见1059个音节的时长模型。同时，以音节时长与声、韵母时长变化关系的拟合结果为基础，对各个音素进行音节级别的语速归一化处理，建立了138个音素的时长模型
英文摘要	With the rapid development of communication science and computer science, Linguistic Engineering has got unprecedented cognizance and concern. Linguistic Engineering includes two main parts: Spoken Language Processing and Natural Language Processing. Generally speaking, it is to make computers obtain the ability to listen and understand the language of human, make some expected responses, and express them in speech. Speech Recognition, Natural Language Understanding and Speech Synthesis are the several core techniques. Our work is mainly on prosody research serving for Speech Recognition and Speech Synthesis. Speech Synthesizer, as the "mouth" in the Human-Computer communication, plays important role to those basic study on Speech Generation and Speech Perception. People bring forth requirement for more natural and manlike synthesis speech. But there is still a big gap between the requirements and what we have got. In general, there are two chief problems to resolve: one is to improve the prosodic model; the other is to improve the synthesizer, to make it realize the prosodic modification with high quality. Prosodic features are also called super-segment features, which include three main features pitch, power and duration. Prosody research is not only important to improve the speech synthesized, but also much helpful for Speech Recognition. Additionally, it is important to the study on emotion speech. Since the Speech Recognition and Speech Synthesis are becoming more and more practical, the problems on prosody are facing us directly. It is urgent to do some deep study on prosody now. Thus, we did the following works: (1). Develop a domain-relative speech synthesis system with high naturalness HNSU system. (2). Since prosody research is both helpful for Speech Synthesis and Speech Recognition, we did the following research work in this area: Some studies on prediction of prosodic breaks in Mandarin,and developed a CART-based model. Investigated the relationship between a syllable's duration and the duration of its consonant and final, and drew some conclusions. Meanwhile, built Line-Simulation function for it. Adopting Multi-Goss Mixture Simulation technique, we built the duration models for 1059 common syllables in Mandarin. Also, we normalized the speech corpus by speed of each syllable, and built duration models for 138 phonems. We proposed a new type of duration model--the relative duration model within each syllable. This model has obvious distinction for different syllables. It is potential for helping speech recognition. Additionally, we did some research work on accent of Mandarin. We developed a new system for stress autodetection based on tonal pitch range, and got good result. 3. Do some study on the analysis and auto-detection of emphasis in Mandarin, develop an emphasis auto-detection model in Mandarin based on tonal pitch range.
关键词	语音合成韵律
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6798
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	慎熙鹏. 汉语普通话语音合成及韵律研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2001.