|其他题名||Research on Collaborative Question Answering System
Collaborative Question Answering
|中文摘要||协作式问答(CQA)系统是满足用户信息交流和知识共享需求的网络问答系统。近年来，CQA相关的工作受到了越来越多的关注，但是还存在很多关键技术问题需要解决。本文基于大规模真实问答数据对CQA系统进行了较深入的研究，主要的工作有： (1) 相似问句检索 基于大规模真实问题提出一种相关词模型，并在相关词模型基础上，对查询问句和候选问句同时作相关词扩展，提出一种计算问句之间语义相似度的方法，应用于相似问句检索。 (2) 答案排序 采用统计翻译模型衡量答案与问句的相关性，同时提出了不同候选答案之间的相关性假设；改进了流形排序方法，将答案与问句以及不同候选答案之间的关系融入流行排序框架，对问答页面内的候选答案排序。 (3) 答案质量判别 论文考虑了答案的浅层文本特征、答案与问题之间的关系特征、答案提供者的特征以及答案上下文特征，然后将四种类型特征融入线性回归模型判别答案是否为问句的高质量答案。 (4) 问题最佳回答者搜索 论文综合了语言模型和LDA主题模型对用户兴趣建模，并分析了用户权威度和用户活跃度等用户先验信息，然后将它们融入统一的概率框架，搜索CQA系统中新问题的最佳回答者。 (5) 协作式问答原型系统 论文设计了一种新的CQA原型系统；对问句分类方法进行了研究，并利用大规模问答数据实现了原型系统的问答搜索和用户搜索两个模块。 关键词：协作式问答，相似问句检索，答案排序，答案质量判别，用户建模，用户搜索，问句分类，原型系统|
|英文摘要||Collaborative question answering (CQA) services such as Yahoo! Answers and Sina Iask have become more and more popular during recent years in providing platforms for people to share knowledge and search information online. However, there are relatively fewer work done on CQA system compared to other information retrieval system. In this thesis, we investigate several key problems in CQA system. The main contributions include following issues: (1) similar question retrieval A word relevance model is trained based on the whole question archive which is made up of millions of natural language questions proposed by users on the web; then a novel method to calculate similarities between questions is proposed with the help of word relevance model by question expansion. (2) answer ranking within a question-answering thread Relations between a question and its candidate answers are built based on the statistical translation model. Besides, inter-answer similarities are calculated. The manifold ranking is taken to propagate ranks among the question and answers. After ranking propagation, each answer gets its ranking score, and candidates answers are sorted by their ranking scores. (3) quality determination of user-generated answers Four types of features are extracted to describe answers, including surface linguistic patterns, question-answer relationships, answer provider's features and structural context features. These types of features are incorporated into the linear regression model to determine the quality the answers. (4) user searching for new questions Interests of the answerers are modeled by tracking users’ answering history. Relationship between the answerer and a new question is measured by language model and the LDA topic model. User authority and user activity are also taken into consideration. A probabilistic framework is utilized to combine all information about users to predict best answerers for new questions. (5) a new CQA prototype system A CQA prototype system is designed, and we have implemented several key modules of the system, including q&a retrieval and user searching for new questions. Meanwhile, we also study the problem of question categorization by comparing two classification models. Key Words: collaborative question answering, question retrieval, answer ranking, answer quality, user modeling, user search, question categorization, prototype system|
刘明荣. 协作式问答系统关键技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.