基于内容分析和行为建模的社区问答关键技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于内容分析和行为建模的社区问答关键技术研究
其他题名	Research on Content Analysis and Behavior Modeling for Community
	周光有
	2012-12-03
学位类型	工学博士
中文摘要	问答系统是自然语言处理和信息检索领域的重要研究课题，然而受限于自然语言处理和人工智能技术的水平，目前自动问答系统能够解决的问题类型非常有限，难以满足真实用户的个性化复杂需求。随着Web 2.0的兴起，基于用户生成内容（User- Generated Content，UGC）的互联网服务越来越流行，社区问答应运而生。区别于传统问答技术，在社区问答上，用户可以提出任何类型的问题，也可以回答其他用户任何类型的问题。社区问答的出现为互联网知识分享提供了新的途径和平台，也为问答技术的发展带来了新的机遇。本文以社区问答数据的高效处理与语义理解为总体目标，从社区问答的文本内容分析和用户行为建模两个方面入手，针对社区问答的关键技术进行了系统研究。主要研究内容如下： 1. 基于上下文信息的大规模相似问题检索相似问题检索是社区问答数据语义理解的核心任务之一。相似问题检索的任务是对于用户提交的查询问题，采用相似问题检索技术从海量的社区问答数据中找出与查询问题相似或相关的问题，并将这些问题的答案返回给用户，用于回答查询问题。相似问题检索面临的困难是“词汇鸿沟”和“词汇歧义”问题。传统方法将相似问题检索看作是一个基于词的翻译任务。然而，基于词的翻译模型忽略了上下文信息，无法解决词汇歧义问题，严重影响了检索的性能。本文提出了基于上下文信息的相似问题检索方法，主要包括两个方面的内容： · 基于统计短语翻译模型的相似问题检索方法：该方法将短语作为翻译的基本单元，这里短语并不是指语言学中的短语，而是连续的词串。相对于基于词的翻译模型，短语翻译模型能够考虑局部上下文信息，可以有效地解决词汇歧义问题。在此基础上，本文提出了基于问答对的统计短语翻译模型、参数估计策略及多特征融合的线性排序算法。实验结果表明，相对于基于词的翻译模型，本文提出的方法可以进一步提升相似问题检索的性能。 · 基于隐含变量模型的相似问题检索方法：该方法将语义相关的词作为全局上下文信息，在一定程度上可以弥补基于词的翻译模型的不足。在此基础上，提出了基于问答对的隐含变量模型和参数估计策略。实验结果表明，相对于基于词的翻译模型，基于隐含变量的方法可以进一步提升相似问题检索的性能。 2. 基于多语言关联学习的大规模相似问题检索为了进一步解决“词汇鸿沟”和“词汇歧义”问题，本文从另外一种角度出发，提出了基于多语言关联学习的相似问题检索方法。该方法利用现有的商业翻译工具将原始问题集翻译成语义等价的另一种语言表示的问题集，经过翻译后的问题集可以辅助原始问题集的检索。首先，每个原始问题中的词如果存在歧义的话，在给定该词所在的上下文的情况下，得到的翻译是唯一的，因此，“词汇歧义”在翻译的过程中得到缓解。另外，原始问题中字面形式不同的多个相关的词得到的翻译可能是唯一的，因此，“词汇鸿沟”可以利用翻译词来解决。实验结果表明，相对于基于上下文信息的方法，基于多语言关联学习的方法可以有效地提升相似问题检索的性能。 3. 快速鲁棒的大规模相似问题检索社区问答是一种...
英文摘要	Question answering system is the important research topic in natural language processing and information retrieval fields. However, limited by the level of natural language processing and artificial intelligence techniques, automatic question answering system can solve only several kinds of questions, it is difficult to meet the personalized and complex information of real users. With the development ofWeb 2.0, user-generated content based web services become more and more popular, community question answering (cQA) is emerging. Different from the traditional question answering technologies, users can ask any kinds of questions and can also answer other users’ different kinds of questions in cQA. The emergence of cQA provides new ways and platforms for web-based knowledge sharing, and also bring new opportunities for the development of question answering technologies. The objective of this paper focuses on efficient data processing and semantic understanding, we conduct systematically research on the key technologies of community question answering, from the aspects of content analysis and user behavior modeling. Our main contributions are listed here: 1. Context-Aware Large-Scale Question Retrieval One of the key tasks of community question answering (cQA) data semantic analysis is question retrieval. Question retrieval refers to how to find the semantically similar questions from the large cQA archives, then the answers of these similar questions will be be used to answer the users’ queried questions. Traditional methods regard the question retrieval as a word-based statistical translation task. However, the word-based translation model ignores the local context information, which may lead to the word ambiguity problem, thus hindering the performance of question retrieval. This paper proposes two context-aware question retrieval methods: · Phrase-based translation model for question retrieval. This method models the sequences of words as whole, rather than model the words as isolation. Compared to the word-based translation model, the proposed phrase-based translation model can take the local context information into consideration, which can effectively resolve the word ambiguity problem. Based on the above motivation, we proposethe statistical phrase-based translation model, parameter estimation strategy, and multiple features combination based linear ranking algorithm. Experimental results show that the proposed method significantly outperforms ...
关键词	自然语言处理社区问答系统问答系统相似问题检索专家用户挖掘 Natural Language Processing Question Answering System Community Question Semantically Similar Question Retrieval Expert Finding
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6488
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	周光有. 基于内容分析和行为建模的社区问答关键技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2012.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20101801462909（3221KB）			暂不开放	CC BY-NC-SA