CASIA OpenIR  > 毕业生  > 硕士学位论文
Alternative TitleMPEG-4 based Speech Driven Face Animation Synthesis Research
Thesis Advisor陶建华
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword可视化语音合成 Mpeg-4标准 语音-视频同步映射 基元选取 隐马尔可夫模型 Visual Speech Synthesis Mpeg-4 Standard Audio-visual Synchronized Mapping Unit Selection Hmm
Abstract随着人们对人机交互要求的不断提高,可视语音合成作为一种重要的人机交互方法受到越来越多研究者的关注。它不仅能提高人机交互的和谐性,还能改进交互识别和表达的准确性,如改进噪声环境中的语音识别效果,帮助听力障碍人士理解语言信息,也可广泛地用于虚拟现实、虚拟主持人、虚拟会议、电影制作、游戏娱乐等很多领域。 可视语音合成研究的重点和难点在于语音与人脸的同步映射模型的建立。其原因在于人们对人脸及其运动太熟悉,对其运动的动态同步特性非常敏感。本文的工作除了建立完整的系统框架外,也着重在于语音人脸同步映射关系的研究。 本文首先简要介绍了可视化语音合成的研究背景和研究内容,然后按照系统建立的四个主要部分分别阐述主要工作内容: 建立了多个适用于不同应用的基于MPEG-4标准的多模态数据库。使用运动实时捕获仪建立了CASIA多模态数据库,该数据库包含同步的语音-二维视频-三维人脸特征点运动信息,可应用于多模态情感识别,语音驱动人脸动画等多个应用场景; 从多模态数据库中分别分析、提取了语音声学特征和基于MPEG-4标准的人脸运动特征,通过FAP参数提取方法,去除了大量的数据冗余信息,并对人脸运动特征给出了主成分量化表达方法,对其进行了分析; 实现了两种语音-人脸动画映射算法:基于动态基元选取的映射方法和基于HMM映射方法,前者侧重于合成动画的真实、自然及连续,后者更侧重于系统实施的实时、自动和高效; 经过平滑算法,输出合成的人脸运动特征参数,驱动网格动画模型人脸运动。
Other AbstractWith more and more requirements for the human-computer interaction (HCI), “Visual Speech Synthesis” receives more and more researchers’ attention. The visual speech synthesis not only increases the harmoniousness of HCI, but also improves the veracity. For example, increasing the speech recognition system’s ability in noisy environment and helping hear-impaired person to better understand others’ information. It is also widely applied in virtual reality, virtual announcer, virtual meeting, movie making and game entertainment. The key and difficult point in visual speech synthesis lies on audio-visual synchronized mapping, because people are very familiar with face movement. This paper mainly focuses on the research in audio-visual synchronized mapping except for the system framework establishment. At the first, the paper gives a brief introduction to the background and research content of visual speech synthesis. Then according to four main steps to establish such a system, the paper describes research work by step: Established a labeled MPEG-4 based multimodal database named CASIA Multimodal Database with motion capture system, which meets different research requirements, the database includes speech-2D video-3D face movement information; Analyzed and Extracted speech and visual features from multimodal data seperately. Through FAPs extraction method, redundant information in large amout of data was given up. For face movement features, the principal component expressions were given and analyzed. Implemented two audio-visual mapping algorithms: dynamic unit selection based and HMMs based audio-visual method. The former one focuses on the reality, natural quality of the synthesized animation, the latter one mainly focuses on the real-time, automatic and effective quality of the system. After smoothing algorithm, outputted the synthesized face movement parameters and drived the model-based face animation model.
Other Identifier200328014604155
Document Type学位论文
Recommended Citation
GB/T 7714
尹潘嵘. 基于MPEG-4的语音驱动人脸动画合成技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2006.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20032801460415(4404KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[尹潘嵘]'s Articles
Baidu academic
Similar articles in Baidu academic
[尹潘嵘]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[尹潘嵘]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.