Human-computer dialog technology makes human-computer interaction more simple and natural. However, to generate a Human-computer dialog system a lot of manpower and resources are required. How to automatically collect training corpus of language modal in restricted domain and build knowledge base of Human-computer dialog system, are two challenges in current research. Focusing on the daily chatting area, we propose approaches of semi-automatic extension to the training corpus of language modal and building knowledge base for dialog. The main contents and contributions are as follows: 1. From the level of word-level expansion, a type of word class expansion methods is introduced. We illustrate the contribution of the method to the speech recognition system through experiments. 2. A semi-automatic method of generating first order predicate knowledge is proposed. The dependency parsing theory is used. We first get rid of stop words in the sentences, then analyze sentences with dependency parsing, next according to the parsing result and a key-word list convert the sentences into first order predicate logic form, finally generate the predicate logic knowledge base. Experimental results show that the method can reach the application level. 3. The thought of word classes applied on the knowledge base of the dialogue system. According to the sentence structure, we classify the text and maintain only one of the same structures. The other is saved into a list of word classes. Then the word classes are further expanded. This method can greatly reduce the size of the knowledge base, and improve processing speed. 4. Applying word class expansion and first order predicate logic knowledge representation methods, we improve the speech globe system.
修改评论