Speech recognition is one of the most indispensable technologies for human to realize a higly intelligentized and fully roboticized information society in the future. With many researchers' enormous efforts, the past tens of years has witnessed significant progress in speech recognition technologies and part of them have already been applied in people's daily life. The technologies of small or medium vocabulary speech recognition, such as voice command recognition, keyword spotting and continuous digit string speech recognition, are of great importance in the application-oriented study o f speech recognition. In this paper we focus our research on Mandarin voice command recognition and keyword spotting. There are several points in my work: 1. Build a high performance Mandarin voice command recognizer, then design and implement a set of concise but powerful APIs. The recognizer uses the tonal class-triphone as acoustic model, represents the pronunciation lexicon with prefix lexical tree and applies a multi-thresholds path pruning in the frame synchronous Viterbi-Beam search. Our experiments indicate that the recognizer achieves a WER below 2 % under a desired environment. 2. Develop a system of Intelligent Speech Robot with the Mandarin voice command recognizer mentioned above, which is a human-machine interactive system mainly used in exhibitions. Since environmental noises, speaker varieties, and sometimes speech recognition for children are the main problems of speech recognition under exhibitions environments, we study methods of speaker clustering, children's speech recognition and simple noise rejection with gagbage model. 3. During the course of developing the Intelligent Speech Robot, we collected speech data of children and constructed a corpus of children's speech, which is the unique one at present in China. We also collected a great amout of natural speech from children under the noisy exhibition environments in unsupervised style. 4. Investigate some aspects of keyword spotting technology and build a basic system of Mandarin keyword spotting. We use atonal syllables as filler models for non-keywords and implement frame synchronous Viterbi-Beam search based on lexical tree.
修改评论