CASIA OpenIR  > 毕业生  > 硕士学位论文
嵌入式语音合成系统的研究与实现
Alternative TitleResearch and Implementation of Embedded Text-to-Speech System
张皖志
Subtype工学硕士
Thesis Advisor陶建华
2006-06-04
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword语音合成 嵌入式语音合成 音库裁剪 混合基元 聚类 决策树 Text-to-speech Embedded Tts Speech Database Pruning Non-uniform Unit Clustering Decision Trees
Abstract随着计算机与网络的不断发展以及社会信息化程度的日益提高,人们对获取信息的手段和方式提出了越来越高的要求。作为人类最有效便捷的通信交互手段,自然的语音交互方式在人机通讯领域的地位日渐凸显。随着嵌入式技术的蓬勃发展,嵌入式设备同人们日常生活的联系变得越来越密切。这些嵌入式设备多以信息终端的形式出现,集计算、通信、传感功能于一身,能方便地与各种设备(包括日常用品)结合在一起。因此为了满足用户对于嵌入式设备更便捷、自然使用的需求,将语音合成技术应用到嵌入式设备中成了必然的趋势。 本文的研究工作如下: 同组内成员合作提出了一种基于韵律环境约束的基元选取算法。在拼接合成系统中,选取单元时除利用传统的上下文信息之外,还利用了候选单元的韵律环境信息。采用决策树的方法对单元的韵律环境进行建模,将预测出的韵律环境属性集成到基元选取的代价函数之中,提高了合成结果的连贯性和自然度。 提出了一种基于混合基元的音库构建方法,混合基元主要由声韵母及其组合对构成,其基本思想在于:在构造波形拼接所需的语料时,可以结合不同类型基本声学单元的优缺点,尽量在音库中保留对于协同发音影响较大的单元组合。 给出了一套完整的基于声学层面的音库量化压缩策略,在对原始音库充分分析的基础上,利用数据挖掘技术对原始音库进行合理适度的载剪,使得生成的目标音库尽量保持原始音库中的韵律特征,从而实现既大幅压缩了音库的规模,又较好地保持了合成结果的自然度和可懂度。 最后实现了一个达到实用化水平的中文嵌入式语音合成系统。制作出了适合不同嵌入式平台下的小型音库,使得合成系统在极小资源消耗的情况下,获得了清晰、自然的合成效果。在实现过程中,重点关注了系统的可载减性、可定制性和可移植性。合成系统可适用于多种嵌入式环境。
Other AbstractWith the fast development of embedded technology, embedded devices are getting closer and closer to human being’s daily life. These devices usually appear in the form of information terminals, which not only integrate multiple functions together, such as computing, communication and sensor, but also could be conveniently connected to conventional devices and equipments. Therefore, to satisfy users’ demand for much easier and more natural access to embedded devices, it is necessary to integrate text-to-speech function into human-machine interface. However, as the research for speech science starts late in China, currently the embedded mandarin speech synthesis technology is far from advanced yet and there are few related mature products. Another reason for the limitation of embedded TTS application is that compared with western language, the prosody model for Mandarin TTS is usually more complicated and needs more text and voice resources to achieve a highly natural result, thus making the system sensitive to its environment. The goal of this paper is to implement an applicable embedded mandarin TTS system which has a commercialization potential. Based on the original syllable-based large scale speech corpus, it innovatively introduces non-uniform unit as the basic system unit, and proposes a data mining based voice library tailoring strategy. It first analyzes the original voice library based on statistics of acoustic features, and then utilizes data mining techniques to do appropriate pruning job, trying to keep as much prosodic and acoustic coverage as possible in the tailored target voice library. The method achieves a big compression ratio while the output speech based on target library remains natural and intelligible. With this method, several small target speech databases are built for different embedded environments. The TTS system based on those databases consumes little resources, but outputs clear and natural speech. Besides, it could be easily adapted to various embedded environments.
shelfnumXWLW1014
Other Identifier200328014604159
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/7387
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
张皖志. 嵌入式语音合成系统的研究与实现[D]. 中国科学院自动化研究所. 中国科学院研究生院,2006.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20032801460415(700KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张皖志]'s Articles
Baidu academic
Similar articles in Baidu academic
[张皖志]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张皖志]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.