CASIA OpenIR  > 毕业生  > 硕士学位论文
基于多视角语义学习的社区问题标注研究
许诺佳
2024-05-19
Pages60
Subtype硕士
Abstract

社区问答网站作为用户寻求各种问题答案的可靠信息来源,获得了广泛的欢迎。社区问答网站的问题信息范围可能广泛涵盖各种知识领域,同时也可能专注于特定领域的问题。其中,为了方便用户快速定位,获取需要的内容,问题通常会被加注多个标签。这些标签概括了问题的主要信息,不仅有助于用户在社区问答网站上查找相关的信息,而且增强了社区问答网站相关的各种应用程序功能。因此,对社区问答网站中的问题进行标注,具有重要的实际应用价值。

社区问答网站中的问题标签呈现出复杂的关系特征和语义信息,为社区问题标注任务带来了挑战。传统的问题标注方法难以充分利用问题标签的关系信息,仅依赖问题标签的语义,未能进行全面的问题标注。本文围绕问题标签复杂的内部结构和深层含义两大挑战,基于多视角语义学习,从内部视角和外部视角两方面出发,分析问题的多方面语义信息,对问题标注任务进行研究。本文的主要工作如下:

(1)基于三元关系多方面图神经网络的社区问题标注方法。社区问答网站中的问题和标签结构关系复杂。传统的问题标注方法主要针对问题和标签的文本信息单独建模,没有充分利用问题标签的各种关系信息,也没能探究问题的多方面语义。本文从内部视角对问题标签信息进行研究,提出了三元关系多方面图神经网络模型,利用异构图结构建模网站中问题标签的三种关系:标签-问题关系,父标签-子标签关系,子标签-父标签关系。同时,本文提出了三元关系问题标签图神经网络,用于获取问题和标签的隐藏表示。其中,多方面问题图神经网络从相关标签中捕获问题的多方面语义,更好地刻画了问题的语义信息。标签-标签关系图神经网络帮助所有标签节点进行双向信息传递,获取相关节点的隐藏表示。然后构建了多方面匹配模块,基于问题的多方面语义信息,为问题检索合适的标签。本文构建了三个真实世界数据集,并进行了实验,验证了模型在真实世界数据集上的有效性。

(2)基于大型语言模型增强的社区问题标注方法。社区问答网站中的问题内容广泛涵盖了各种知识领域,其中某些问题可能需要外部专业知识帮助理解。传统的人工标注方法效率低下,主观性过强。本文针对问题标签文本的深层复杂含义,从外部视角具体分析问题中的专业知识,提出了大型语言模型增强的问题标注方法。它利用传统的问题标注方法,在社区问答网站的数据库中为问题预检索合适的标签。随后设计了提示词,让大型语言模型理解任务要求,引入外部知识信息帮助理解问题语义,并从候选标签中选择更符合问题语义的标签。这保证了最终检索到的标签是存在于社区问答网站标签库中的可用标签。在真实世界数据集上的实验结果验证了通过大型语言模型引入外部知识辅助问题标注任务的有效性。

综上所述,本文围绕社区问答网站中问题标签的特点,形成了有效的问题标注方法。通过充分挖掘问题标签文本自身的文本结构特征和引入外部真实世界知识,有效地利用了社区问答网站中有价值的信息,为推荐系统等上层应用贡献了基础研究支持。

Other Abstract

Community Question Answering websites have emerged as significant repositories of information for users in search of diverse answers. These websites encompass a broad spectrum of knowledge domains, catering to both general and specialized questions. To streamline users' access to pertinent content, questions are typically associated with multiple tags. These tags serve to encapsulate the core subject matter of each question, facilitating efficient retrieval of pertinent information. Moreover, the strategic use of tags not only optimizes search functionalities within CQA websites but also enhances the functionality of various applications related to CQA websites. Consequently, the practice of question tagging on CQA websites holds considerable practical significance.

The questions and tags on the CQA websites exhibit complex relations and semantics, thereby bringing challenges to question tagging tasks. Traditional question tagging methods often fall short in adequate utilizing of these relations between questions and tags, primarily relying on the semantics of questions and tags. This thesis revolves around two major challenges of complex relations and profound semantics inherent in questions and tags. Based on multi-perspective semantic learning, it analyzes the multi-faceted semantic information of the questions from both internal and external perspectives, and studies the question tagging task. The contributions of the thesis are outlined as follows: 

(1) A Question Tagging Method Based on Tri-Relational Multi-Faceted Graph Neural Networks. Relations between questions and tags in CQA websites are complex. The traditional question tagging methods model the semantics of questions and tags separately, without fully utilizing the various relations of questions and tags. They also fail to explore the multifaceted semantics of the questions. This thesis studies the question tag information from an internal perspective, and proposes a Tri-Relational Multi-Faceted Graph Neural Network, which utilizes heterogeneous graph structures to model these relations in CQA websites, tag-question relations, parent-child relations, and child-parent relations. The thesis also proposes a Tri-Relational Question-Tag GNN for learning informative node features of questions and tags. Specially, a Multi-Faceted Question GNN is designed to extract multiple facets of semantics from related tags for questions. The Tag-Tag Relations GNN does bi-directional message passing for all tag nodes to learn hidden features. Then, the Multiple Matching Component is constructed to retrieve appropriate tags for the questions based on diverse facets of question semantics. The thesis constructed three real-world datasets and conducted experiments to verify the effectiveness of the model on real-world datasets. 

(2) A Question Tagging Method Based on Prompting Large Language Models. Questions posted on CQA websites encompass multiple knowledge domains, with certain inquiries necessitating external professional expertise for comprehensive comprehension. Artificial question tagging methods are inefficient and overly subjective. This thesis focuses on the complex meaning of questions and tags, analyzes the professional knowledge in the questions from an external perspective, and proposes a Large Language Model Enhanced Question Tagging method. It utilizes traditional question tagging models to pre-retrieve appropriate tags for questions in the database of CQA websites. Subsequently, prompts are designed for Large Language Models to enable a comprehensive understanding of the task, and select appropriate tags that better match the question semantics. This method ensures that the final retrieved tags are available in CQA websites. The experimental results on real-world datasets verify the effectiveness of introducing external knowledge for question tagging tasks through large language models.

In summary, the thesis has developed effective methods for question tagging tasks based on the characteristics of questions and tags on CQA websites. By thoroughly examining the semantics and structures inherent in both questions and tags, coupled with the integration of external real-world knowledge, valuable insights within CQA websites have been effectively harnessed. This endeavor has significantly enriched the foundational research, thereby furnishing essential support for diverse applications, including recommendation systems.
 

Keyword社区问答 表示学习 图神经网络 提示学习
Language中文
Sub direction classification数据挖掘
Paper associated data
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/57154
Collection毕业生_硕士学位论文
Recommended Citation
GB/T 7714
许诺佳. 基于多视角语义学习的社区问题标注研究[D],2024.
Files in This Item:
File Name/Size DocType Version Access License
毕业论文_601.pdf(2773KB)学位论文 限制开放CC BY-NC-SA
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[许诺佳]'s Articles
Baidu academic
Similar articles in Baidu academic
[许诺佳]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[许诺佳]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.