嵌入式语音合成系统的研究与实现

CASIA OpenIR > 毕业生 > 硕士学位论文

	嵌入式语音合成系统的研究与实现
其他题名	Research and Implementation of Embedded Text-to-Speech System
	张皖志
	2006-06-04
学位类型	工学硕士
中文摘要	随着计算机与网络的不断发展以及社会信息化程度的日益提高，人们对获取信息的手段和方式提出了越来越高的要求。作为人类最有效便捷的通信交互手段，自然的语音交互方式在人机通讯领域的地位日渐凸显。随着嵌入式技术的蓬勃发展，嵌入式设备同人们日常生活的联系变得越来越密切。这些嵌入式设备多以信息终端的形式出现，集计算、通信、传感功能于一身，能方便地与各种设备（包括日常用品）结合在一起。因此为了满足用户对于嵌入式设备更便捷、自然使用的需求，将语音合成技术应用到嵌入式设备中成了必然的趋势。本文的研究工作如下：同组内成员合作提出了一种基于韵律环境约束的基元选取算法。在拼接合成系统中，选取单元时除利用传统的上下文信息之外，还利用了候选单元的韵律环境信息。采用决策树的方法对单元的韵律环境进行建模，将预测出的韵律环境属性集成到基元选取的代价函数之中，提高了合成结果的连贯性和自然度。提出了一种基于混合基元的音库构建方法，混合基元主要由声韵母及其组合对构成，其基本思想在于：在构造波形拼接所需的语料时，可以结合不同类型基本声学单元的优缺点，尽量在音库中保留对于协同发音影响较大的单元组合。给出了一套完整的基于声学层面的音库量化压缩策略，在对原始音库充分分析的基础上，利用数据挖掘技术对原始音库进行合理适度的载剪，使得生成的目标音库尽量保持原始音库中的韵律特征，从而实现既大幅压缩了音库的规模，又较好地保持了合成结果的自然度和可懂度。最后实现了一个达到实用化水平的中文嵌入式语音合成系统。制作出了适合不同嵌入式平台下的小型音库，使得合成系统在极小资源消耗的情况下，获得了清晰、自然的合成效果。在实现过程中，重点关注了系统的可载减性、可定制性和可移植性。合成系统可适用于多种嵌入式环境。
英文摘要	With the fast development of embedded technology, embedded devices are getting closer and closer to human being’s daily life. These devices usually appear in the form of information terminals, which not only integrate multiple functions together, such as computing, communication and sensor, but also could be conveniently connected to conventional devices and equipments. Therefore, to satisfy users’ demand for much easier and more natural access to embedded devices, it is necessary to integrate text-to-speech function into human-machine interface. However, as the research for speech science starts late in China, currently the embedded mandarin speech synthesis technology is far from advanced yet and there are few related mature products. Another reason for the limitation of embedded TTS application is that compared with western language, the prosody model for Mandarin TTS is usually more complicated and needs more text and voice resources to achieve a highly natural result, thus making the system sensitive to its environment. The goal of this paper is to implement an applicable embedded mandarin TTS system which has a commercialization potential. Based on the original syllable-based large scale speech corpus, it innovatively introduces non-uniform unit as the basic system unit, and proposes a data mining based voice library tailoring strategy. It first analyzes the original voice library based on statistics of acoustic features, and then utilizes data mining techniques to do appropriate pruning job, trying to keep as much prosodic and acoustic coverage as possible in the tailored target voice library. The method achieves a big compression ratio while the output speech based on target library remains natural and intelligible. With this method, several small target speech databases are built for different embedded environments. The TTS system based on those databases consumes little resources, but outputs clear and natural speech. Besides, it could be easily adapted to various embedded environments.
关键词	语音合成嵌入式语音合成音库裁剪混合基元聚类决策树 Text-to-speech Embedded Tts Speech Database Pruning Non-uniform Unit Clustering Decision Trees
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/7387
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	张皖志. 嵌入式语音合成系统的研究与实现[D]. 中国科学院自动化研究所. 中国科学院研究生院,2006.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20032801460415（700KB）			暂不开放	CC BY-NC-SA