In this thesis, we focus on the application of deep learning techniques in the domain of speech processing. Deep learning techniques which have been resurrect recently have become the-state-of-art methods in acoustic modelling of speech recognition. Compared to the shallow models, deep models, such as deep neural networks,deep convolutional neural networks, which can extract high-level robust features through multiple hidden layers can be seen as more powerful feature extractors and fuse different kinds of features in the input layer. Chinese is spoken by many more people than other languages in the world. As it has been spoken by so many people, it has some characteristics: 1. Chinese is a tonal language. In a tonal language, pitch information is used to discriminate the meaning of different words. A word spoken with different tones corresponds to different meanings. Therefore, tone related information can play an important role in the Chinese speech recognition systems. 2. Chinese has many dialects. China is a vast country and has a long history. In the development of Chinese, people coming from different regions gradually spoke different dialects. As is well known, Mandarin or Putonghua is the official language in modern China, therefore, most of the Chinese people can speak Mandarin. However, Mandarin spoken in different regions can be affected by local dialects and result in different kinds of native accented Mandarin. Accent is one of the key factors in worsening the performance of speech recognition systems. Therefore, accurate accent identification and accent adaptation methods can improve the performance of Mandarin speech recognition systems. The main contributions and novelties of this thesis are listed as follow: 1. We proposed a deep neural networks based tone modeling method. Our focus is on the capacity of extracting high-level robust features and fusing different kinds of serially concatenated features of deep models. Furthermore, Maxout networks have been proposed to integrate dropout naturally and achieve state-of-the-art results. Therefore, we investigate the advantage of DMNs when the training data is limited and imbalanced. Our experiments on the ASCCD corpus show that comparing with shallow models such as one-hidden layer multi-perception (MLP) and support vector machine(SVM), deep models improve Mandarin tone recognition significantly. Among the deep models, DMNs can get better performance comparing with other deep neural net...
修改评论