CASIA OpenIR  > 毕业生  > 硕士学位论文
基于跨平台话题关联挖掘的媒体转载模式分析研究
王林子
2023-05-21
页数94
学位类型硕士
中文摘要

移动互联网的革新催生了以微博、微信公众号等为代表的新型媒体,与传统门户网站共同作为新融媒体时代背景下新闻信息发布与传播复合式平台的重要组成部分。深入、细致、全面分析跨平台媒体转载模式有助于理解媒体转载动因,预判社会热点关注发展趋势,评估媒体传播影响力,从而为相关管理部门提供辅助决策支持。本文在现有媒体转载分析研究基础上,借鉴预训练文本表示学习、异质网络表示学习等领域的研究成果,分析跨平台媒体关注话题以建立话题关联,围绕热点转载段落预测、媒体转载类型识别深入开展媒体转载模式的分析研究。主要工作内容总结如下:

1.   基于站点关联增强的跨平台媒体关注话题转载预测方法。跨平台媒体站点历史转载交互关系蕴含媒体对关注话题的追踪过程,为有效融合媒体交互信息与历史发布内容,生成语义多元化、异质性的媒体、话题表征,本文提出基于站点关联增强的跨平台媒体关注话题转载预测方法。该方法首先构建基于跨平台媒体站点间交互关系的异质关联网络,并学习媒体站点向量表征;进而将其作为注意力机制来源引导编码历史发布内容,建模具有跨平台关联属性与深层语义信息的媒体新闻内容表征;最终利用交互注意力机制融合媒体新闻内容,生成语义多元化的媒体-话题融合表征,预测话题转载。实验表明,站点节点表征引导的层次-交互注意力机制能够充分利用网络关联关系,多角度、有侧重融合媒体历史发布新闻内容,便于更好地预测未来媒体-话题间转载关系。

2       基于风格增强-多视角关联建模的热点转载段落预测方法。新闻热点转载段落通常与新闻标题具有密切的语义关联性、高度的写作风格一致性。针对如何利用写作风格特征辅助挖掘标题-段落间深层语义关联的挑战性问题,本文提出基于风格增强-多视角关联建模的热点转载段落预测方法。该方法利用多头Transformer层将写作风格标签同新闻标题与候选段落的预训练语义表征进行融合以增强表征语境性;进而提出维度转换层从多个潜在视角计算标题与各候选段落语义向量对间的相关性得分,预测段落被转载概率值;最终协同使用交叉熵-排序损失函数,引导模型准确预测热点转载段落。实验表明,写作风格特征能够辅助挖掘新闻标题-候选热转段落对间多视角关联性,有助于识别热点转载段落潜在特征从而将其准确预测。

3       基于双编码-多粒度语义要素交互的媒体转载类型识别方法。新闻转载过程中存在单词变体、句子改写、篇章重述等多种转载类型,与词语、句子、段落等粒度语义要素的新闻摘编、转述息息相关。针对如何有效建模并融合转载新闻对在不同层级粒度的语义要素交互信息的挑战性问题,本文提出基于双编码-多粒度语义要素交互的媒体转载类型识别方法。该方法首先联合word2vecBERT双编码器分别学习新闻在词语、句子、段落等层级粒度的语义表征;进而建模新闻对的多粒度语义要素交互关联矩阵,经池化操作后获取相应粒度的交互语义表征;最终,融合各粒度语义要素交互信息得到转载新闻对关联表示向量,捕获深层语义关联,识别媒体转载类型。实验表明,从多粒度挖掘转载新闻对间语义要素交互关联性能够全面、细致捕获转载行为特征,从而提升媒体转载类型的识别性能。

英文摘要

The innovation of the mobile Internet has given birth to new media represented by Weibo and WeChat, which together with portal websites constitute a composite platform for the release and dissemination of news information under the background of the new media era. An in-depth, detailed, and comprehensive analysis of the cross-platform media reprint pattern contributes to understanding the motivation of media reprint, predicting the development trend of social hotspots, and evaluating the influence of media communication, which can provide auxiliary decision-making support for relevant management departments. Based on the existing research on media reprint analysis, this thesis draws on the research results in pre-training text representation learning and heterogeneous network representation learning to predict cross-platform media-concerned topics and establish topic associations. Then the thesis carries out in-depth analysis and research on the media reprint pattern by predicting hotspot reprint paragraphs and identifying media reprint types. The major works of this thesis are summarized as follows:

1. A method for cross-platform media-concerned topic reprint prediction based on site association information-enhanced. The historical reprint interaction relations between cross-platform media sites imply the tracking process of media's concerned topics. To effectively integrate media interaction information and historical release content, and generate semantically diverse and heterogeneous media-topic representations, this thesis proposes a method for cross-platform media-concerned topic reprint prediction based on site association information-enhanced. This method first constructs a heterogeneous network based on the interactive relationship between cross-platform media sites and learns their vector representations. Then, site node representations work as the source of attention mechanism to guide the encoding of historical publishing content. Finally, an interactive attention mechanism is proposed to fuse media news, which generates semantically diverse media-topic fusion representations to predict media-topic reprint relationships. Experiments show that the hierarchical-interactive attention mechanism guided by site node representations can make full use of network association relations. The mechanism focuses on the fusion of media's historical news content from multiple views, which is convenient for better prediction of future media-topic reprint relationships.

2. A method for hotspot reprint paragraph prediction based on style-enhanced and multi-perspective relevance modeling. The hotspot reprint paragraphs of news usually have close semantic relevance and high writing style consistency with titles. Aiming at the challenge of how to use writing style features to assist in mining deep semantic associations between titles and paragraphs, this thesis proposes a method for hotspot reprint paragraph prediction based on style-enhanced and multi-perspective relevance modeling. To enhance the contextuality of representations, this method uses a multi-head Transformer layer to fuse the writing style label with the pre-trained semantics of news titles and candidate paragraphs. Furthermore, the dimension conversion layer is proposed to calculate the correlation score between the title and each candidate paragraph semantic vector pair from multiple potential perspectives, predicting the probability value of the paragraph being reprinted. Finally, the cross-entropy and ranking loss functions are used simultaneously to guide the model to accurately predict the hotspot reprint paragraph. Experiments show that writing style features contribute to mining multi-perspective correlations between news titles and candidate hotspot reprint paragraphs, which can identify potential features of hotspot paragraphs for accurate prediction.

3. A method of media reprint type identification based on dual-encoder and multi-granularity semantic element interaction. During the process of news reprint, there are a variety of reprint types such as word variant, sentence conversion, and chapter restatement, which are closely related to news excerpts and retelling of granular elements such as words, sentences, and paragraphs. To overcome the challenge of how to effectively model and integrate news on the interaction information of semantic elements at different levels of granularity, this thesis proposes a method of media reprint type identification based on dual-encoder and multi-granularity semantic element interaction. This method first combines word2vec and BERT dual encoders to learn the semantic representation of news at the granularity of words, sentences, and paragraphs. Then, the multi-granularity semantic element interaction matrix of the news pair is modeled, and the interaction semantic representation of the corresponding granularity is obtained after the pooling operation. Finally, the interaction information of semantic elements at each granularity is fused to obtain the reprint news pair's association representation vector, which captures deep semantic associations and identifies media reprint types. Experiments show that mining the interactive correlation of semantic elements between reprint news pairs at multiple granularities can comprehensively and meticulously capture reprint behavior features, thereby improving the identification performance of media reprint types.

关键词跨平台媒体转载模式请输入关键词 预训练编码 话题转载 热转段落 转载类型
语种中文
七大方向——子方向分类社会计算
国重实验室规划方向分类社会系统建模与计算
是否有论文关联数据集需要存交
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/51855
专题毕业生_硕士学位论文
推荐引用方式
GB/T 7714
王林子. 基于跨平台话题关联挖掘的媒体转载模式分析研究[D],2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
基于跨平台话题关联挖掘的媒体转载模式分析(2983KB)学位论文 限制开放CC BY-NC-SA
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[王林子]的文章
百度学术
百度学术中相似的文章
[王林子]的文章
必应学术
必应学术中相似的文章
[王林子]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。