CASIA OpenIR  > 毕业生  > 硕士学位论文
汉语普通话语音合成及韵律研究
慎熙鹏
学位类型工学硕士
导师徐波
2001-06-01
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词语音合成 韵律
摘要在信息科学和计算机科学飞速发展的今天,言语工程技术的发展受到全世 界的普遍关注。言语工程技术主要包括口语信息处理及自然语言处理两个部分。 总的说来,其任务就是要让机器听懂、理解人类的语言,并且能够做出理想的 反应,通过同样的言语表达出来。语音识别技术、自然语言理解以及语音合成 技术是其核心的几项技术。本文中的工作主要围绕以语音合成和辅助语音识别 为目的的韵律研究而进行的。 语音合成技术在人机通讯中充当着“嘴”的作用,同时对语音的产生和感 知模型等基础研究也有着十分重要的意义。人们希望机器发出的声音能更自然、 更贴近人的声音,而目前汉语语音合成系统与这一要求还相差较远。一般认为 提高合成自然度主要要解决两个问题:一是提高韵律模块质量;二是提高合成 器质量,即在有了韵律参数之后,合成器如何高质地实现这些韵律。韵律特征 也称超音段特征,其内容包括言语中除了音色之外的其余三个特征——音高、 音强和音长。韵律特征不仅对于提高语音合成的自然度有着重要的作用,而且 可以对语音识别产生辅助作用、提高语音识别的正确率和速度。此外,韵律特 征与感情语音的研究密切相关。在语音识别、语音合成逐步进入实用化阶段的 今天,韵律特征已经成为一个不能回避的问题,对于韵律特征的深入研究,成 为当前口语处理的客观需要。 正是基于以上需要,作者进行了如下几个方面的研究工作: (1)在语音合成方面,作者研究比较了各种语音合成方法的特点,在汉 语具体特点基础上,采用波形拼接的方法,建立了一个高自然度旅游信息语音合 成系统 (2)韵律控制对于语音识别和语音合成都有着重要的作用,作者在韵律 方面进行了如下工作。 ◆ 针对语音合成,作者进行了韵律停顿方面的研究。根据汉语特点, 作者建立了三棵决策树,对汉语韵律停顿进行等级式预测。取得了 较好的预测效果。 ◆ 作者还对汉语时长进行了深入研究,对于音节时长与其声、韵母时 长的变化关系进行详细的统计分析,得出与汉语已有的某些结论不 同的新结论;同时,利用直线拟合的方法,建立了该关系的拟合函数。 ◆作者采用多维高斯混合拟合的方法,在说话人语速归一化的基础上, 建立了汉语常见1059个音节的时长模型。同时,以音节时长与声、 韵母时长变化关系的拟合结果为基础,对各个音素进行音节级别的 语速归一化处理,建立了138个音素的时长模型
其他摘要With the rapid development of communication science and computer science, Linguistic Engineering has got unprecedented cognizance and concern. Linguistic Engineering includes two main parts: Spoken Language Processing and Natural Language Processing. Generally speaking, it is to make computers obtain the ability to listen and understand the language of human, make some expected responses, and express them in speech. Speech Recognition, Natural Language Understanding and Speech Synthesis are the several core techniques. Our work is mainly on prosody research serving for Speech Recognition and Speech Synthesis. Speech Synthesizer, as the "mouth" in the Human-Computer communication, plays important role to those basic study on Speech Generation and Speech Perception. People bring forth requirement for more natural and manlike synthesis speech. But there is still a big gap between the requirements and what we have got. In general, there are two chief problems to resolve: one is to improve the prosodic model; the other is to improve the synthesizer, to make it realize the prosodic modification with high quality. Prosodic features are also called super-segment features, which include three main features pitch, power and duration. Prosody research is not only important to improve the speech synthesized, but also much helpful for Speech Recognition. Additionally, it is important to the study on emotion speech. Since the Speech Recognition and Speech Synthesis are becoming more and more practical, the problems on prosody are facing us directly. It is urgent to do some deep study on prosody now. Thus, we did the following works: (1). Develop a domain-relative speech synthesis system with high naturalness HNSU system. (2). Since prosody research is both helpful for Speech Synthesis and Speech Recognition, we did the following research work in this area: Some studies on prediction of prosodic breaks in Mandarin,and developed a CART-based model. Investigated the relationship between a syllable's duration and the duration of its consonant and final, and drew some conclusions. Meanwhile, built Line-Simulation function for it. Adopting Multi-Goss Mixture Simulation technique, we built the duration models for 1059 common syllables in Mandarin. Also, we normalized the speech corpus by speed of each syllable, and built duration models for 138 phonems. We proposed a new type of duration model--the relative duration model within each syllable. This model has obvious distinction for different syllables. It is potential for helping speech recognition. Additionally, we did some research work on accent of Mandarin. We developed a new system for stress autodetection based on tonal pitch range, and got good result. 3. Do some study on the analysis and auto-detection of emphasis in Mandarin, develop an emphasis auto-detection model in Mandarin based on tonal pitch range.
馆藏号XWLW606
其他标识符606
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/6798
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
慎熙鹏. 汉语普通话语音合成及韵律研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2001.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[慎熙鹏]的文章
百度学术
百度学术中相似的文章
[慎熙鹏]的文章
必应学术
必应学术中相似的文章
[慎熙鹏]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。