Question answering system is the important research topic in natural language processing and information retrieval fields. However, limited by the level of natural language processing and artificial intelligence techniques, automatic question answering system can solve only several kinds of questions, it is difficult to meet the personalized and complex information of real users. With the development ofWeb 2.0, user-generated content based web services become more and more popular, community question answering (cQA) is emerging. Different from the traditional question answering technologies, users can ask any kinds of questions and can also answer other users’ different kinds of questions in cQA. The emergence of cQA provides new ways and platforms for web-based knowledge sharing, and also bring new opportunities for the development of question answering technologies. The objective of this paper focuses on efficient data processing and semantic understanding, we conduct systematically research on the key technologies of community question answering, from the aspects of content analysis and user behavior modeling. Our main contributions are listed here: 1. Context-Aware Large-Scale Question Retrieval One of the key tasks of community question answering (cQA) data semantic analysis is question retrieval. Question retrieval refers to how to find the semantically similar questions from the large cQA archives, then the answers of these similar questions will be be used to answer the users’ queried questions. Traditional methods regard the question retrieval as a word-based statistical translation task. However, the word-based translation model ignores the local context information, which may lead to the word ambiguity problem, thus hindering the performance of question retrieval. This paper proposes two context-aware question retrieval methods: · Phrase-based translation model for question retrieval. This method models the sequences of words as whole, rather than model the words as isolation. Compared to the word-based translation model, the proposed phrase-based translation model can take the local context information into consideration, which can effectively resolve the word ambiguity problem. Based on the above motivation, we proposethe statistical phrase-based translation model, parameter estimation strategy, and multiple features combination based linear ranking algorithm. Experimental results show that the proposed method significantly outperforms ...
修改评论