CASIA OpenIR  > 模式识别国家重点实验室  > 语音交互
Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech
Li, Ya1; Tao, Jianhua1; Hirose, Keikichi2; Xu, Xiaoying1,3; Lai, Wei1,3
2015-09-01
发表期刊SPEECH COMMUNICATION
卷号72页码:59-73
文章类型Article
摘要Expressive speech synthesis has received increased attention in recent times. Stress (or pitch accent) is the perceptual prominence within words or utterances, which contributes to the expressivity of speech. This paper summarizes our contribution to Mandarin expressive speech synthesis. A novel hierarchical stress modeling and generation method for Mandarin is proposed and further integrated into HMM-based speech synthesis (HTS) and Fujisaki model-based speech synthesis systems to accurately model the undulation of pitch contour. In HMM-based expressive speech synthesis, stress-related contextual features obtained from the hierarchical model are introduced in modeling the prosodic variation caused by stress, in addition to the traditional prosodic features used in HTS. A rule-based and a Deep Belief Network based prosodic variation models are proposed and then used in stress adaptation module in HTS. The other approach uses the Fujisaki model to improve the expressiveness of synthetic speech. The hierarchical stress model is introduced into the phrase and tone command control mechanisms of the model. The pitch contour is then directly generated by the superposition of two-level commands of the Fujisaki model. Experimental results using the proposed hierarchical stress modeling and generation methods showed that the macro- and microcharacteristics of stress could be successfully captured. The methodology proposed in this paper has application to a range of areas such as conveying attitude and indicating focus in spoken dialog systems. (C) 2015 Elsevier B.V. All rights reserved.
关键词Prosody Stress Hierarchical Modeling Fujisaki Model Speech Synthesis
WOS标题词Science & Technology ; Technology
关键词[WOS]SPEAKER ADAPTATION ; EMOTIONAL SPEECH ; CONTEXT ; ALGORITHM ; CONTOURS
收录类别SCI
语种英语
WOS研究方向Acoustics ; Computer Science
WOS类目Acoustics ; Computer Science, Interdisciplinary Applications
WOS记录号WOS:000359169000005
引用统计
被引频次:3[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符http://ir.ia.ac.cn/handle/173211/8898
专题模式识别国家重点实验室_语音交互
作者单位1.Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing, Peoples R China
2.Univ Tokyo, Dept Informat & Commun Engn, Tokyo 1138654, Japan
3.Beijing Normal Univ, Dept Chinese Language & Literature, Beijing 100875, Peoples R China
推荐引用方式
GB/T 7714
Li, Ya,Tao, Jianhua,Hirose, Keikichi,et al. Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech[J]. SPEECH COMMUNICATION,2015,72:59-73.
APA Li, Ya,Tao, Jianhua,Hirose, Keikichi,Xu, Xiaoying,&Lai, Wei.(2015).Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech.SPEECH COMMUNICATION,72,59-73.
MLA Li, Ya,et al."Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech".SPEECH COMMUNICATION 72(2015):59-73.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Hierarchical stress (1701KB)期刊论文作者接受稿开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Li, Ya]的文章
[Tao, Jianhua]的文章
[Hirose, Keikichi]的文章
百度学术
百度学术中相似的文章
[Li, Ya]的文章
[Tao, Jianhua]的文章
[Hirose, Keikichi]的文章
必应学术
必应学术中相似的文章
[Li, Ya]的文章
[Tao, Jianhua]的文章
[Hirose, Keikichi]的文章
相关权益政策
暂无数据
收藏/分享
文件名: Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。