CASIA OpenIR  > 毕业生  > 博士学位论文
汉语语音转换方法的研究
其他题名A Study on Chinese Voice Conversion
康永国
学位类型工学博士
导师徐波 ; 陶建华
2006-06-04
学位授予单位中国科学院研究生院
学位授予地点中国科学院自动化研究所
学位专业模式识别与智能系统
关键词语音转换 基频转换 混合映射算法 情感语音转换 共振峰估计 Voice Conversion Pitch Conversion Hybrid Mapping Method Emotional Voice Conversion Formant Estimation
摘要汉语语音转换研究内容是通过语音处理手段改变汉语语音中的说话人个性信息,使得改变后的语音听起来像是由另外一个说话人发出的。本文分析了语音特征中蕴藏的说话人个性信息,针对转换语音音质下降的问题提出了在时域和频域抑制转换特征过平滑问题的方法,针对汉语基频的特点提出了基于pitch target模型的基频转换算法,最后将汉语语音转换技术应用在汉语情感语音合成研究中。论文涉及到的主要工作有: 1. 分析了说话人个性信息的声学表征问题。我们把基于同一文本的语音所存在的差异分为生理性差异和态度性差异。在生理性差异分析中,论文研究了不同说话人在以共振峰为代表的声道特征和以声门波参数为代表的声源特征中的差异性。在态度性差异分析中,论文主要分析了情感语音相对于中性语音在韵律参数上的不同。 2. 提高了转换语音的音质。针对高斯混合模型映射方法的过平滑转换特征导致的转换语音音质下降问题,论文从时域和频域两个方面来解决。针对时域过平滑现象提出了高斯混合模型和码本映射相结合的混合映射算法,针对频域过平滑问题提出使用锐化共振峰带宽的后滤波方法。 3. 提出了适用于汉语的基频转换方法。针对汉语基频的特点,提出了基于pitch target模型的基频转换方法。实验表明pitch target模型具有很强的汉语基频描述以及转换能力,通过转换pitch target模型参数既可以转换基频曲线所在的调域,又可以改变基频曲线的走势使得转换基频曲线和目标基频曲线在形状上更趋于一致。 4. 构建了汉语情感语音转换系统。论文选择使用可重建高质量语音的STRAIGHT算法构建汉语语音转换系统,并将此系统应用在情感语音转换中。由于使用了基于pitch target模型的基频转换算法,不但转换了中性基频曲线的调域,而且改变了基频曲线的形状获得了对应情感基频曲线的走势,因此成功地实现了情感语音转换。 5. 提出了基于频域子带预测的非线性共振峰估计算法。论文提出了一种基于频域子带自动预测的语音多成分分离算法,避免了以前方法的经验参数选择问题。以提出的频域子带为基础,我们将此非线性分析方法应用在共振峰估计中,准确鲁棒地进行了共振峰估计并且避免了繁杂的共振峰轨迹跟踪算法。
其他摘要Chinese voice conversion is a technology that can change a Chinese speech’s specific speaker characteristic and make the transformed speech to sound as if another speaker had spoken it. The dissertation analyzes the speaker individual information in speech features, and proposes two methods to reduce overly smoothing problems of time domain and frequency domain, and then proposes pitch target model based pitch conversion method. The dissertation contains following works: 1. Analyzing the individual information in acoustic features. The difference among speeches based on the same transcription is divided into physiological difference and attitudinal difference. On the physiological difference, we investigate the difference in formant frequency representing vocal fold features and glottal parameters representing speech source features from different speakers. On the attitudinal difference, we study the distribution of prosodic features from emotional speech compared with neutral speech. 2. Enhancing voice quality of the transformed speech. Because overly smoothing problems of GMM mapping method will degrade voice quality of the transformed speech, we analyze and resolve these problems in time domain and frequency domain. As for overly smoothing in time domain, we propose a hybrid mapping method combined GMM and codebook mapping method; as for overly smoothing in frequency domain, we employ a post-filtering method to sharpen the formant bandwidth. 3. Proposing a specific pitch conversion method for Chinese. According to characteristics of Chinese pitch, we propose a pitch target model based pitch conversion method. Experiments have proved that the pitch target model has grate capabilities of describing and converting Chinese pitch. The pitch target model based pitch conversion method can not only modify the range of pitch contour, but also change the pitch contour’s trend to conform the converted pitch contour to the target pitch contour in shape. 4. Building an emotional speech conversion system. The dissertation uses STRAIGHT algorithm to construct a Chinese voice conversion system, and implement an emotional generation system based on voice conversion. Because of the proposed pitch target model based pitch conversion method, the system can successfully generate an emotional speech from an input neutral speech. 5. Proposing a non-linear formant estimation method based on frequency subband prediction. A novel method, using band pass filtering within predicted subbands instead of frequency ranges determined by experiential selection, is proposed to decompose a speech into mono-component signals. Then this method is employed in formant estimation, and this experiment indicates the method not only correctly calculates formant frequencies but also avoids complicated formant tracking procedure.
馆藏号XWLW999
其他标识符200218014603209
语种中文
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/5935
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
康永国. 汉语语音转换方法的研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2006.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
CASIA_20021801460320(2540KB) 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[康永国]的文章
百度学术
百度学术中相似的文章
[康永国]的文章
必应学术
必应学术中相似的文章
[康永国]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。