CASIA OpenIR  > 毕业生  > 硕士学位论文
Alternative TitleResearch and Implementation on Embedded Bilingual TTS System
Thesis Advisor陶建华
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword中英文双语语音合成 嵌入式系统 音库压缩 中英文融合 Bilingual Tts System Embedded System Corpus Compression Combination Of Chinese And English Engine
Abstract随着人机交互技术的发展,越来越多的实际产品中采用了较传统方式而言更为友好的人机交互形式。而语音合成技术,作为人机交互技术中的关键技术之一,已经被广泛的应用到许多嵌入式设备之中。在实际应用中,经常有多种语言同时出现在一句话当中的情况。最典型的是,中文和英文经常同时出现在日常用语中。因此处理多语言是语音合成技术通向实用必须要解决的问题。本文的研究工作如下: 提出了一种基于决策树聚类的音库裁减策略,它以样本的韵律特征相似性为客观依据,同时考虑到每个样本所处上下文环境的相似性,对每一个音节的样本进行聚类。可以根据用户指定的压缩率,在每一类中挑选离类中心最近的若干样本,完成对音库的裁减。 参与实现了基于韵律模板的韵律预测模型。采用决策树的方法对音节的韵律特征进行建模,包括时长、能量、静音、基频均值、基频最大值、基频最小值、基频起始值、起始处的斜率、基频终止值、终止处的斜率等;在进行预测时,除了考虑传统的上下文信息之外,还利用候选单元的韵律特征预测其前后音节的韵律环境,并以此作为拼接代价和目标代价的计算依据,使用VITERBI搜索的方法从韵律库中得到最优的韵律模板序列,完成韵律参数的预测。 调查了英文基本声学单元的韵律特征的分布情况,论证了采用中文语音合成系统中成熟的韵律预测方法对英文进行韵律预测的可能性,并且考虑了英文基本声学单元与中文基本声学单元之间的不同之处,在保持大框架不变的前提下,对模型细节部分做了一定的修改。 实现了一个嵌入式中英文双语语音合成系统的原型。在实现过程中,主要关注了中文合成系统与英文合成系统的融合。讨论了由于嵌入式平台硬件的局限性带来的问题,并提出了相应的解决方案。
Other AbstractWith the development of human computer interaction (HCI), more and more products adopt more elegant human computer interfaces, which is much better than the traditional way. Text-to-Speech, as one of the most important HCI technologies, has been used widely in embedded systems, such as PDA, Smart phone, etc. In practice, multiple languages always appear in one sentence, for example, Chinese and English. Therefore, how to manage different languages simultaneously is an important issue in the research of Text-to-Speech. The goal of this paper is to implement a prototype of embedded bilingual TTS system which can manage Chinese and English simultaneously. It proposes a decision tree based voice library tailoring strategy, which uses prosody features together with context information to cluster samples. It develops a template-based prosody prediction model, which uses decision tree and viterbi search technology to retrieve the best series of templates from a template corpus to predict the prosody information. This model emphasizes the mutual dependence of adjacent syllables in Chinese. In order to combine Chinese TTS and English TTS together in a unified way, the paper investigates the distributions of prosody features of phones in English, and test the prosody prediction model, which is originally used in Chinese TTS system, in predicting English prosody. The results indicate it is possible to combine Chinese and English together using the same model. Finally, an embedded bilingual TTS system is built on Windows Mobile platform. Special attentions have been paid to combine the two languages in a unified way. Plus, we discuss the limitations of embedded system and their influences on TTS system, and bring forth some proposals to overcome these limitations.
Other Identifier200528014628030
Document Type学位论文
Recommended Citation
GB/T 7714
黄力行. 嵌入式中英文双语语音合成系统[D]. 中国科学院自动化研究所. 中国科学院研究生院,2008.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20052801462803(876KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[黄力行]'s Articles
Baidu academic
Similar articles in Baidu academic
[黄力行]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[黄力行]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.