基于增强文本表示的新闻推荐方法及其面向特定领域的应用研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 互联网大数据与信息安全

	基于增强文本表示的新闻推荐方法及其面向特定领域的应用研究
	孙颖
	2022-05-22
页数	82
学位类型	硕士
中文摘要	随着信息技术的迅速发展，大众通过访问新闻网站来获取世界资讯和动态，在线新闻已经成为最重要的互联网基础服务之一。网络平台上的海量新闻在为用户提供多样化阅读选择的同时，信息量的空前增加也带来信息过载问题，使用户难以获取自己真正感兴趣的内容。这种状况不仅严重影响用户的阅读体验，也会造成信息资源的浪费。新闻推荐旨在向用户呈现满足其阅读需求的新闻，通过提高用户对所关注新闻的访问效率来应对信息过载问题。新闻推荐不仅是推荐系统和自然语言处理交叉方向的一个重要研究课题，同时也具有较好的应用价值。不同于其他推荐任务，新闻推荐具有新闻文章内容丰富和“用户-新闻”历史交互稀疏的特点，因而面临如何充分利用新闻语义信息以及如何挖掘历史数据中新闻之间的关系这两个挑战。另外，新闻推荐在舆情分析等实际应用场景中具有重要作用，因而如何结合用户对特定领域的兴趣进行有针对性的推荐也是一个应用挑战。现有的新闻推荐方法在挖掘新闻语义信息时缺乏对新闻类别层次等信息的有效利用且忽略了历史数据中候选新闻之间的交互关系，同时尚未考虑实际应用场景下面向特定领域对用户的阅读兴趣进行建模。针对现有新闻推荐工作存在的问题和挑战，本论文在新闻表示学习的过程中使用新闻的类别层次和候选新闻之间的内容交互信息增强新闻文本的语义表示，建立基于增强文本表示的新闻推荐方法，包括结合文本和类别信息的多层次新闻推荐方法以及基于候选文本内容交互的新闻推荐增强方法。在此基础上，本论文进一步建立面向特定领域的新闻推荐方法，满足用户快速获取特定领域信息的需求，更好地服务于舆情分析等实际应用。本论文的主要工作和研究贡献归纳如下： 1.为了有效利用新闻中不同粒度的语义信息，提出结合文本和类别信息的多层次新闻推荐方法。该方法首先使用新闻类别、子类别以及文本层次的信息分别进行新闻表示学习、用户偏好建模、点击概率预测，再融合不同层次上的点击概率作为最终结果。通过显式利用新闻的类别层次结构，该方法能够充分挖掘用户与候选新闻在不同语义层次上的关联，从多个角度确定推荐结果。基准数据集上的实验结果验证了所提出方法的有效性，证实了新闻类别和子类别对推荐的重要性。 2.为了建模一组候选新闻之间的竞争关系，提出基于候选文本内容交互的新闻推荐增强方法。该方法以多文本交互建模的方式进行候选新闻表示学习，生成交互增强的候选新闻表示。作为一种增强性的方法，该方法以前文提出的多层次新闻推荐方法为基础，在文本层次上进行多文本交互建模。基准数据集上的实验结果表明所提出的方法有效，并且相对于原有的多层次新闻推荐方法具有明显的性能提升。 3.考虑到舆情分析等实际应用场景下用户对特定领域新闻的需求，聚焦面向领域的新闻推荐问题，提出面向特定领域的新闻推荐方法。该方法在用户偏好建模阶段学习用户的一般偏好和特定偏好，分别代表用户历史行为中反映出的兴趣模式以及用户对特定领域信息的需求；之后基于以上两种偏好计算点击概率并进行融合，作为最终预测结果。从大规模公开数据集中构建3个安全相关的领域数据集，实验结果表明所提出的方法达到了最佳效果。
英文摘要	With the rapid development of information technology, the public obtains worldwide information by visiting news websites. Online news services have become one of the most important Internet infrastructures. While the massive news on online platforms provides users with diversified reading options, the unprecedented increase of information brings about the problem of information overload and makes it difficult for users to obtain the content they are really interested in. This not only seriously degrades the user's reading experience, but also causes the waste of information resources. News recommendation, which aims to provide users with news that is consistent with their reading interests, could alleviate information overload by improving the efficiency of users accessing the news they care about. News recommendation is not only at the intersection of recommender system and natural language processing, but also has significant application value. Different from other recommendation tasks, news recommendation has two unique characteristics, namely the richness of news content and the sparsity of user-news interactions. These characteristics pose two challenges, which are how to make full use of news semantic information and how to mine the relationship between news in historical data. In addition, news recommendation plays an important role in practical applications such as public opinion analysis. Therefore, it is also an applied challenge that how to make targeted recommendations based on users’ interests in a specific domain. Existing news recommendation methods lack the effective utilization of news category hierarchy and ignore the interaction between candidate news in historical data while mining news semantic information. At the same time, they have not considered the domain-oriented modeling of users' reading interest in practical application scenarios. To address the problems and challenges of news recommendation, this thesis exploits the category hierarchy and candidate news interaction to enhance the semantic representation of news text in the process of news representation learning and proposes news recommendation methods based on enhanced text representation, including multi-level news recommendation with text and category information and candidate interaction enhanced news recommendation. On this basis, this thesis further proposes a domain-oriented news recommendation method to meet the requirements of users to quickly obtain information in a specific domain, thus serving practical applications such as public opinion analysis. The major work and contributions of this thesis are summarized as follows: 1. To exploit the semantic information at different granularities in news effectively, this thesis proposes a multi-level news recommendation method with text and category information. This method first utilizes the information of news category, subcategory, and text separately for news representation learning, user preference modeling, and click probability prediction. Then the click probabilities at different levels are aggregated as the final result. By utilizing the category hierarchy of news explicitly, this method could fully mine the connections between users and candidate news at different semantic levels, and determine the recommendation results from multiple perspectives. Experimental results on benchmark datasets verify the effectiveness of the proposed method and illustrate the importance of news category and subcategory for recommendation. 2. To model the competitive relationship between a set of candidate news, this thesis proposes a candidate interaction enhanced news recommendation method. This method learns candidate news representations by interactive multi-text modeling to generate interaction-enhanced candidate news representations. As an enhancement method, this method takes the multi-level news recommendation method proposed in the previous chapter as a fundamental framework and adopts interactive multi-text modeling at the text level. Experimental results on benchmark datasets demonstrate that the proposed method is effective and achieves performance improvement compared with the original multi-level news recommendation method. 3. Considering the requirements of users to obtain news in a specific domain in applications such as public opinion analysis, this thesis focuses on the problem of domain-oriented news recommendation and proposes a domain-oriented news recommendation method. This method learns general and specific preferences during user modeling, which represent the interest pattern reflected in the user's historical behavior and the user's demand for information in a specific domain, respectively. Then the click probabilities are computed based on the above two types of preferences and aggregated as the final prediction result. Three security-related domain datasets are constructed from a large-scale public dataset, and experimental results show that the proposed method achieves the state-of-the-art performance.
关键词	新闻推荐增强文本表示类别层次结构候选新闻交互面向领域的推荐
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/48714
专题	多模态人工智能系统全国重点实验室_互联网大数据与信息安全毕业生_硕士学位论文
推荐引用方式 GB/T 7714	孙颖. 基于增强文本表示的新闻推荐方法及其面向特定领域的应用研究[D]. 中国科学院自动化研究所. 中国科学院自动化研究所,2022.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
孙颖-硕士学位论文（打印）.pdf（1995KB）	学位论文		开放获取	CC BY-NC-SA