Recent years have seen great improvements in the performance of automatic speech recognition, and Dialog System is one of applications of these performances. But in conventional speech recognition system, only a plain text is presented as the final result, and all acoustic information of speech are cutoff. The aim of this publication is to add intonation information to traditional output of speech recognition engine, which is believed to reflect the emotion and intention of speaker. In this paper, we propose a robust approach to classify several kinds of intonations, e.g. declarative, interrogative, exclamatory, etc. Since it is still an open question on how to describe intonations, different kinds of features are investigated here to choose the most effective features for intonations classification. Support Vector Machine (SVM) is used as the classifier to perform the task of feature selection and combination. In our experiment, we address the speech recognition based methods, and use recognized results replace the transcribed text. Our goal is to simulate intonation classification in the real speech recognition. The speech materials used in this experiment were well designed includes three intonations, total about 4700 sentences. Experimental results show that our system can achieves the accuracy of (84.13%) for the task of three types of Chinese intonation classification.
修改评论