CASIA OpenIR  > 毕业生  > 硕士学位论文
基于STRAIGHT模型的声音转换方法
其他题名Voice Conversion Algorithm based on STRAIGHT Model
马建春
学位类型工学硕士
导师刘文举
2006-06-03
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词声音转换 动态特征 韵律特征变换 Psola算法 Voice Conversion Dynamic Feature Prosodic Feature Transformation Psola Algorithm
摘要在信息技术与计算机科学迅速发展的今天,人们追求个性化,娱乐化,简单快捷的技术应用。声音转换或声音个性化技术就是这样一种能给人们带来全新体验的技术,它是当前语音技术研究的热点。 声音转换是改变源说话人的声音,使其听起来具有目标说话人特性的技术。它在语音即时聊天,电影﹑广播﹑电视中剪辑和配音,语音合成的语料库收集,语音合成后端语音的个性化处理,在说话人辨认,情报部门等都有很多应用。声音转换包括两个阶段:在训练阶段,提取源与目标说话人少量训练语音的说话人特征进行训练得到映射规则;在转换阶段,对源说话人测试语音特征按照映射规则预测目标语音特征,最后由预测语音特征合成转换目标语音。声音转换中的两个关键问题包括1、建立精确的映射规则,即要拥有好的转换算法;2、获取代表说话人身份信息的说话人特征。声音转换主要任务就是要改变说话人特征,而其它的内容信息和说话环境信息则保留不变。其中主要变换两方面的说话人特征,音段特征(短时频谱)和超音段特征(基频)。在提取特征和转换合成语音时要用到语音分析合成模型,本文着眼点是在高质量STRAIGHT 语音分析合成模型下所做的一些工作。 本文主要完成了以下几方面的工作: (1)阅读国内外文献,了解语音转换的研究现状,熟悉各种语音转换方法的基础上,对其进行了优劣的比较。 (2)由语音识别采用动态特征参数得到启发,人耳对动态特征更为敏感。基于此,利用动态特征作为新的声学特征,用以提高声音转换质量。 (3)韵律特征反映了说话人的说话风格,尤其平均F0和语音速度对说话人识别的贡献很大,研究表明平均F0解释了55%的辨别说话人的能力。所以更好地变换韵律特征能使转换的语音更接近目标说话人。在目前韵律特征变换的基础上,本文采用了用联合矢量和CG-GMM模型的韵律特征变换方法,有效地改善了系统的转换性能。 (4)PSOLA算法是一种能在时域调节语音波形音长和音高的方法,PSOLA算法实时性好,且合成的语音无杂音,在时长基频调节范围很大的情况下,仍然能得到很好的音质。对此算法,本文进行了总结并在参考一些源码的基础上实现了此算法。通过(3)得到预测目标语音的韵律参数后,就可以用PSOLA算法实现韵律的调整。
其他摘要Nowadays, the information technology and computer science are developed very rapidly, people purse applications which has individuality, entertainment, simpleness and quickness. Voice conversion or voice personalization is just a technology which can take new experience to people.Know it is a hot area in the research of speech technology. Voice conversion is the process of transforming the characteristics of speech uttered by a source speaker, such that a listener would believe the speech was uttered by a target speaker. It has many applications in our life, such as real-time chatting, looping and dubbing for film broadcast and Television, collecting corpus for the Text-to-Speech system£¬other also as a pre-processing step to speech recognition and also in the field of voice disguise. There are two main key problem:(1)set up precise mapping rule,(2)extract the feature of speaker characteristics which can identity the speaker. The main task of voice conversion is to modify the speaker characteristics while retaining the content of utterance and information of environment. Mainly two speaker characteristics need to be transformed, segmental characteristics(short time spectral)£¬supra-segmental characteristics (pitch). This paper is based on the high quality analysis-synthesis model of STRAIGHT. My work include: 1.Read the literatures of voice conversion, in order to understand the present situation of the research in the world. Also compare the method of the voice conversion, know the advantage and disadvantage of various method. 2.Inspire from the speech recognition using the dynamic MFCC features, the author introduce the dynamic feature for the voice conversion, the human ear are more sensitivity to the dynamic feature, so use this feature can enhance the speech conversion quality. 3.Prosodic feature reflect the speaking style of the speaker, particular the mean F0 and speech velocity are most contributes to the speaker recognition. The research indicates the mean F0 explain the 55£¥ ability for the speaker recognition. So pitch information is of very important in voice conversion. Based on the research of the previous work, the author use the joint vector and CG-­GMM model for the prosodic feature transformation. This method leads to satisfactory pitch transformation. 4. PSOLA algorithms is a method which can modify the time-scale and pitch-scale in the time domain, while the time domain approach provides very efficient solutions for the real time implementation. It can modify the prosody in large scale while retaining a high level of naturalness and quality. The author summarize this algorithm and implement it refers to some open source code.
馆藏号XWLW990
其他标识符200328014604143
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/7381
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
马建春. 基于STRAIGHT模型的声音转换方法[D]. 中国科学院自动化研究所. 中国科学院研究生院,2006.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20032801460414(1276KB) 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[马建春]的文章
百度学术
百度学术中相似的文章
[马建春]的文章
必应学术
必应学术中相似的文章
[马建春]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。