CASIA OpenIR  > 毕业生  > 硕士学位论文
Thesis Advisor刘文举
Degree Grantor中国科学院大学
Place of Conferral北京
Keyword语音情感识别 音频词袋模型 深度神经网络 统计方法 句特征向量
Other Abstract







Speech is an important way of communication between people. The emotional information in speech makes conversation more fluently, because it helps understanding the emotional state of speaker, and emphasizing the speech content. Nowadays, speech emotion recognition (SER) has been an important research topic in artificial intelligence. People understand the speech emotion because that the brain has the ability of perception of emotional information. SER is the simulation of the process of perception. In SER, acoustic features are firstly extracted from audio signals, then we extract the key features which are related to emotions from acoustic features, and finally we model the relationship between the key features and emotions. How to extract the key features from audio signals is the main topic in this dissertation.

Based on previous researches, we focus on feature analysis, and study reasonable recognition strategies. The main contents can be concluded as follows:

Firstly, recognition model which is based on the enhanced bag-of-audio-words (BoAW) model is proposed. The enhanced BoAW model can extract utterance-level feature vectors of the input audio files using vector quantization, where the ''Multi-codeword'' idea helps the utterance-level feature vectors contain sufficient emotional information. The utterance-level feature vectors are more discriminant for classification, and get a great recognition result.

Secondly, recognition model which is based on deep neural network (DNN) and statistical methods is explored. In our research, transfer learning method and auto-encoder are respectively used for initialization of DNN. Then, data filtering is used to deal with the probability distribution matrix, and finally statistical methods are used to extract the utterance-level feature vectors. The method achieves a great recognition result.

Thirdly, an off-line SER software is developed based on MFC platform. The software consists of four modules, which are training module, recognition module, interface module and multi-thread module.

Document Type学位论文
Recommended Citation
GB/T 7714
梁雅萌. 基于特征分析建模的语音情感识别[D]. 北京. 中国科学院大学,2016.
Files in This Item:
File Name/Size DocType Version Access License
基于特征分析建模的语音情感识别.pdf(5305KB)学位论文 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[梁雅萌]'s Articles
Baidu academic
Similar articles in Baidu academic
[梁雅萌]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[梁雅萌]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.