CASIA OpenIR  > 毕业生  > 硕士学位论文
融合用户信息的新闻事件排序方法研究
孔祥飞1,2
2018-05-27
学位类型工学硕士
英文摘要       近年来,随着互联网的迅猛发展和普及,在线新闻浏览已经成为社会媒体中用户获取信息的重要途径。同时,当社会事件发生时,人们也会积极地参与其中:发表观点、表达意愿,并由此产生了海量的用户评论数据。新闻事件排序以事件相关的新闻文档作为研究对象,根据输入的查询语句与事件的相关性来生成事件排序列表,是一个重要的研究课题,特别是在与安全相关的领域应用中,如舆情监控、检索、检测和挖掘等。为了更好地满足应用需求和提供更丰富的事件排序结果,我们将事件的用户信息引入到事件查询语句当中,建立融合用户信息的新闻事件排序方法。本论文面向包含有用户评论的在线新闻数据,利用智能分析技术,从新闻报道中提取客观事件属性,从评论中提取主观用户属性,根据查询语句和事件的各属性相关度构建事件排序模型。最后基于在线新闻数据集对所提出的事件排序模
型进行有效性验证。论文的主要工作和创新点归纳如下:

       1. 提出融合用户信息的事件描述及事件排序数据集的构建。针对现有事件描述方法局限于客观事件属性表述的问题,从事件的用户评论内容中抽取主观用户信息,结合客观事件属性和主观用户属性对事件进行描述,并建立了事件各个属性的抽取方式。针对海量文档中聚类出现的文档相似度度量不全面和聚类阈值选取标准不一等问题,基于改进的文档相似度算法以及层次聚类方法,从海量新闻文档中抽取事件。
       2. 提出了一种基于支持向量机的主观事件排序方法 SVM-ERanker。该方法组合了事件查询语句中事件客观信息和事件新闻报道相关性以及查询语句中主观用户信息和事件新闻评论的相关性,利用事件排序列表中事件之间的偏序关系构建样本,采用支持向量机的优化策略来求解事件排序问题,并实验验证了所提出的对级事件排序方法能够处理融合有用户信息的事件查询语句所涉及的事件排序问题。
       3. 提出了一种基于深度神经网络的主观事件排序算法 DNER。考虑到现有事件排序算法中特征工程的困难性,基于深度神经网络在特征构建和提取方面的优势提出一种基于深度神经网络自动构建相关性特征的事件排序算法。在事件查询语句和事件的语义向量层进行操作,通过多种卷积操作方式来实现诸如查询词重要性、词频相关性和 BM25 相关性指标的抽取,并借助长短期记忆循环神经网络结构将各种相关性指标进行融合,并通过实验验证了 DNER 算法能够很好的提取查询语句和事件之间各主观、客观属性的相关性特征。
; In recent years, with the rapid development and popularization of the Internet, online news browsing becomes one of the most important methods for users to obtain the latest event information in social media platform. Meanwhile, when a social event occurs, people incline to participate in the discussion of event actively by commenting to express their willingness and viewpoints, thus producing tremendous amounts of user reviews data. News Event Ranking, which takes event-related news documents for the generation of ranked events and ranks news events based on the relevance of input query and event, is an essential research issue, especially on security-oriented applications, such as public event monitoring, retrieval, detection and mining. To better satisfy usersapplication requirement and provide a more valuable event ranking result, we embed event’s user information into the event query, establish news event ranking methods fused with user information. This thesis focuses on online news datasets which contain user reviews data. By intelligent
analysis technology, we extract objective event aspects from news document and
subjective event aspects from news reviews. In the end, we rank news events based
on the relevance between query and event. We evaluate the effectiveness of the proposed approach by a serious of experiments on online news datasets. The main work and innovation points of this thesis are summarized as follows:
       1. We propose an event description method fused with user information and construct news datasets used for the problem of news event ranking based on the proposed event description method. Considering that existing event description methods are confned to the objective event aspects, we extract subjective user information from news reviews, describe event by mergeing user information and objective event aspects. On the other hand, we improve the document similarity calculation method to deal with the incomprehensive measurement of existing similarity calculation methods. We also improve the hierarchical clustering method to extract event from massive news datasets.
       2. We present a subjective event ranking method based on 
support vector machine. The proposed method combines two kinds of relevance metrics. One is the relevance of objective event information in event query and event news report, the other is the relevance of subjective user information in event query and event news reviews. We utilize the partial order relationships among events in the event ranking list to construct model samples, thus solving the event ranking problem by adopting the optimization strategy of support vector machine. Experiments on a large news event dataset show that our approach can merge the relevance between query and event effectively.
       3.We put forward a subjective news event ranking model based on deep neural network. The proposed model uses the convolutional neural network to extract the correlation between event query and document constituted by event reports and event reviews. By conducting different network structures on the semantic vector of query and event, the model depicts the term importance relevance, term frequency relevance and BM25-like relevance. Then the model integrates various correlation matrics by the Long Short Term Memory Network. Experimental results show that the model present can achieve good results in the news event ranking task.
关键词事件抽取 用户信息 事件排序 深度神经网络
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/21058
专题毕业生_硕士学位论文
作者单位1.中国科学院大学
2.中科院自动化所
第一作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
孔祥飞. 融合用户信息的新闻事件排序方法研究[D]. 北京. 中国科学院研究生院,2018.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
Thesis-with_sign.pdf(2841KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[孔祥飞]的文章
百度学术
百度学术中相似的文章
[孔祥飞]的文章
必应学术
必应学术中相似的文章
[孔祥飞]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。