With the development of human computer interaction (HCI), more and more products adopt more elegant human computer interfaces, which is much better than the traditional way. Text-to-Speech, as one of the most important HCI technologies, has been used widely in embedded systems, such as PDA, Smart phone, etc. In practice, multiple languages always appear in one sentence, for example, Chinese and English. Therefore, how to manage different languages simultaneously is an important issue in the research of Text-to-Speech. The goal of this paper is to implement a prototype of embedded bilingual TTS system which can manage Chinese and English simultaneously. It proposes a decision tree based voice library tailoring strategy, which uses prosody features together with context information to cluster samples. It develops a template-based prosody prediction model, which uses decision tree and viterbi search technology to retrieve the best series of templates from a template corpus to predict the prosody information. This model emphasizes the mutual dependence of adjacent syllables in Chinese. In order to combine Chinese TTS and English TTS together in a unified way, the paper investigates the distributions of prosody features of phones in English, and test the prosody prediction model, which is originally used in Chinese TTS system, in predicting English prosody. The results indicate it is possible to combine Chinese and English together using the same model. Finally, an embedded bilingual TTS system is built on Windows Mobile platform. Special attentions have been paid to combine the two languages in a unified way. Plus, we discuss the limitations of embedded system and their influences on TTS system, and bring forth some proposals to overcome these limitations.
修改评论