CASIA OpenIR  > 模式识别国家重点实验室  > 语音交互
Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech
Li, Ya1; Tao, Jianhua1; Hirose, Keikichi2; Xu, Xiaoying1,3; Lai, Wei1,3
AbstractExpressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated into HMM-based speech synthesis (HTS) and Fujisaki model-based speech synthesis systems to accurately model the undulation of pitch contour. In HMM-based expressive speech synthesis, stress-related contextual features obtained from the hierarchical model are introduced in modeling the prosodic variation caused by stress, in addition to the traditional prosodic features used in HTS. A rule-based and a Deep Belief Network based prosodic variation models are proposed and then used in stress adaptation module in HTS. The other approach uses the Fujisaki model to improve the expressiveness of synthetic speech. The hierarchical stress model is introduced into the phrase and tone command control mechanisms of the model. The pitch contour is then directly generated by the superposition of two-level commands of the Fujisaki model. Experimental results using the proposed hierarchical stress modeling and generation methods showed that the macro- and microcharacteristics of stress could be successfully captured. The methodology proposed in this paper has application to a range of areas such as conveying attitude and indicating focus in spoken dialog systems. (C) 2015 Elsevier B.V. All rights reserved.
KeywordProsody Stress Hierarchical Modeling Fujisaki Model Speech Synthesis
WOS HeadingsScience & Technology ; Technology
Indexed BySCI
WOS Research AreaAcoustics ; Computer Science
WOS SubjectAcoustics ; Computer Science, Interdisciplinary Applications
WOS IDWOS:000359169000005
Citation statistics
Cited Times:3[WOS]   [WOS Record]     [Related Records in WOS]
Document Type期刊论文
Affiliation1.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing, Peoples R China
2.Univ Tokyo, Dept Informat & Commun Engn, Tokyo 1138654, Japan
3.Beijing Normal Univ, Dept Chinese Language & Literature, Beijing 100875, Peoples R China
First Author AffilicationChinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
Recommended Citation
GB/T 7714
Li, Ya,Tao, Jianhua,Hirose, Keikichi,et al. Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech[J]. SPEECH COMMUNICATION,2015,72:59-73.
APA Li, Ya,Tao, Jianhua,Hirose, Keikichi,Xu, Xiaoying,&Lai, Wei.(2015).Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech.SPEECH COMMUNICATION,72,59-73.
MLA Li, Ya,et al."Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech".SPEECH COMMUNICATION 72(2015):59-73.
Files in This Item: Download All
File Name/Size DocType Version Access License
Hierarchical stress (1701KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li, Ya]'s Articles
[Tao, Jianhua]'s Articles
[Hirose, Keikichi]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Ya]'s Articles
[Tao, Jianhua]'s Articles
[Hirose, Keikichi]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Ya]'s Articles
[Tao, Jianhua]'s Articles
[Hirose, Keikichi]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech.pdf
Format: Adobe PDF
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.