With the development of speech recognition techniques, the study of language modeling is becoming more important, and is currently one of the hottest points in the field of speech and language processing. During three years for Ph.D degree, the author investigate systematically relevant problems on Chinese language modeling for speech recognition, and present some original approaches effective for solving these problems and improving speech recognition accuracy. The main contributions of the author are as follows: 1. Oriented on speech recognition, an engineering approach for corpus processing and language information acquisition was introduced. Some practical problems have also been discussed. 2. Based on corpus from full text of "People's Daily" in 1993 with about 20,000,000 Chinese characters, we have taken the lead in setting up a word trigram Chinese language model in 1995. It has been applied in our Chinese dictation system with 32K words successfully and has drastically reduced the error rate. 3. An improved trigram model with word similarity information and local POS knowledge was presented for further data smoothing. 4. Taking into account of the drawbacks of most current statistical language modeling, the author has proposed a kind of integrated language modeling with multi-KS (knowledge sources and probability sources) in multi-models including baseline model of improved trigram, word association (WA) model, practical stochastic phrase grammar (SCFG) model as well as dynamic adaptive model, etc. Preliminary experiments show that the integration of multi-KS is an effective solving strategy for the self-organization of linguistic units. 5. Some detail problems on language processing and modeling have also been investigated systematically. 6. A more general and flexible self-organization language modeling was tentative analyzed This dissertation is a survey of the author's work. It consists of eleven chapters. Chapter 1 is the retrospect of the development and representative works in the fields of natural language processing and speech recognition. Chapter 2 describes the current status and main approaches of language modeling. In chapter 3 and chapter 4, based on the basic process of language information acquisition from corpus, some special problems and corresponding solution are addressed, which include: · A set of the principle for Chinese word definition and segmentation used in speech recognition. · An approach of finding new words and smoothing statistical errors. · A novel algorithm of language information acquisition without the help of prime dictionary. · and a practical word class learning approach. Chapter 5 introduces Chinese interpolation trigram modeling and its application in our speaker- independent Chinese dictation system with 32K vocabulary. Chapter 6
修改评论