Language model is always a crucial part of speech recognition system. Today the research of speech recognition turns to more application fields and more complex tasks. The data sparseness problem becomes more severe. During the five years of my Ph.D. study, I have investigated the key technologies of data clustering based language modeling and language model adaptation. The main research work focused on the following three aspects: I proposed a Mod-KN smoothing based hierarchical class language model. This model always favors longer contexts. For unseen events, it takes the backoff according to a hierarchical word class tree. It takes advantage of both the power of word n-grams for frequent events and the predictive power of class n-gram for unseen or rare events. The Mod-KN smoothing based hierarchical class language model outperforms the Good-Turing smoothing based one on both frequent events and unseen events. I proposed a shared backoff for random forest language models, and applied random forest language models to language identification systems. The random forest language models can decrease the drawback of greedy node splitting of decision trees by randomness. The shared backoff method can improve the models robustness while maintaining the randomness of each decision tree. Language identification experiments showed that the random forest language models significantly outperforms n-gram and binary decision tree language models. Furthermore, for speech recognition tasks we combined the random forest language model and the Mod-KN based hierarchical class language model, and obtained further improvements both on perplexity and recognition accuracy. Broadcast news recognition is now a focused field of speech recognition research. This paper presents a unified language model adaptation framework for broadcast news recognition which combines our non-iterative new words extraction approach, a novel open-vocabulary Chinese language model, a perplexity-based corpus selection approach and an n-gram distribution adaptation module. In our experiments, this framework obtained 10% relative error reduction. I also proposed a new template based recognition error correction method. This method does not need hard decisions of error detection, thus avoiding the errors of detection module. It segments the speech recognition results into small parts, which is easier to correct. It also uses edit distance and acoustic confusion scores to select from templates. This improves the robustness of the correction results. Experiments showed that recognition accuracy can be improved both on better covered test set and on normally covered test set.
修改评论