基于深度学习的用户兴趣画像研究

CASIA OpenIR > 多模态人工智能系统全国重点实验室 > 互联网大数据与信息安全

	基于深度学习的用户兴趣画像研究
	刘若然
	2019-05-25
页数	102
学位类型	硕士
中文摘要	随着移动互联网的快速发展，社会媒体逐渐成为人们分享经验、获取信息的重要平台。从多粒度情感、内容语义和交互关系分析角度，研究用户兴趣建模方法，有助于更全面准确地挖掘用户兴趣，便于其获取更优质的信息服务。本文旨在借鉴深度学习、自然语言处理、数据挖掘等领域的研究进展，研究基于多任务学习的、具有可解释性的用户兴趣建模方法，主要工作如下：基于多任务学习的用户多粒度情感分析方法。针对各类情感独立建模存在无法捕获各类情感之间的关联、难以从多角度全面挖掘用户情感特征的问题，本文提出了一种基于多任务学习的用户多粒度情感分析方法，同时预测用户的总体情感及各类细粒度属性情感。该方法首先使用双向长短期记忆网络编码用户评论中的词语上下文信息，生成词语的深度语义表示；随后，该方法使用注意力机制分别度量词语与各类细粒度属性的相关性，将词语表示加权求和得到各类属性的表示；进一步，计算各属性情感对用户总体情感的影响，将属性表示的加权和作为用户总体情感表示；最终，基于总体表示和属性表示同时预测用户的总体情感和细粒度属性情感。融合内容语义和交互关系的用户兴趣建模方法。针对用户在社会媒体发表的文本内容有时存在语义稀疏性的问题，结合社会媒体的交互性特点，本文设计了一种融合内容语义和交互关系的用户兴趣建模方法，从内容语义及关联网络中挖掘用户兴趣。该方法首先使用网络表示学习方法分析用户关联网络，生成用户的交互行为表示；而后，该方法将用户的交互行为表示与词向量拼接得到词语的集成化表示，并利用卷积神经网络从集成化表示中提取重要兴趣特征，生成帖子的深度语义表示，从而探索了用户交互行为对内容语义的影响；进一步，使用双向长短期记忆网络编码帖子之间的依赖关系生成用户的内容语义表示；最终，该方法基于融合内容语义和交互行为信息的综合性表示预测用户兴趣偏好。基于门限机制及词语移动距离的用户兴趣摘要抽取方法。为了提高用户兴趣建模方法的可解释性，针对自动抽取与用户兴趣密切关联的词语的挑战性问题，本文提出了一种基于门限机制及词语移动距离的用户兴趣摘要抽取方法，抽取代表性的词语和句子直观解释各类兴趣相关的人物。该方法首先使用门限机制和卷积操作融合场景信息，控制词语信息在网络中的流动；而后，使用最大池化操作选取重要词语特征，生成语义表示并预测兴趣类别，从而建立了词语特征与兴趣类别之间的关联关系；进一步，将最大池化操作识别的各个类别中最重要的词语作为相应的兴趣特征；最终，使用词语移动距离度量各个句子与兴趣特征的相关性及句子之间的语义冗余性，将相关性高、语义冗余性小的句子作为兴趣文摘。
英文摘要	With the rapid development of mobile internet, social media has become the major platform where people share experience and obtain information. Researches on user interest modeling methods from multi-granularity sentiment, semantic and interactive relationships analysis will help mine users' interests more comprehensively and accurately so that users can obtain better information service. By referring to the progress in the field of deep learning, natural language processing and data mining, this thesis aims to study the explainable user interest modeling method based on multi-task learning. The major works of this thesis are summarized as follows: Users' multi-granularity sentiment analysis method based on multi-task learning. Independent modeling of each sentiment can't capture the association between sentiments and it's difficult to comprehensively mine user sentiment characteristics from multiple perspectives. To solve this problem, this thesis proposed a users' multi-granularity sentiment analysis method based on multi-task learning, which can predict users' overall sentiment and fine-grained sentiment towards each attribute simultaneously. The proposed method firstly encoded the contextual information between words with a bidirectional long short-term memory network and generated deeply semantic representation of words. Then, it adopted attention mechanism to measure words' relevance to each attribute and regarded the weighted sum of word representations as the corresponding attribute representation. Furtherly, it computed each attribute's impact on users' overall sentiment and regarded the weighted sum of attribute representations as overall representation. Finally, users' overall sentiment and fine-grained sentiment towards each attribute were predicted based on overall and attributed representations. User interest modeling method integrating content semantics and interactive relationships. Since the contents published by users on social media sometimes has the problem of semantic sparsity, according to the interactive characteristics of social media, this thesis proposed a user interest modeling method integrating content semantics and interactive relationships, which mined user interests from both contents and interactive network. The proposed method firstly analyzed interactive network with network embedding method and generated users' interactive representation. Then, it obtained the integrated representation of words by concatenating word vectors with users' interactive representation. Convolutional neural network is adopted to extract important features from integrated representations and generate deeply semantic representation of posts, thus exploring the impact of interactive behavior on content semantics. Furtherly, it used a bidirectional long short-term memory network to encode the dependency relationship between posts and generated users' semantic representation. Finally, it predicted users' interests based on synthetic representations integrating semantic and interactive information. User interest summarization extraction method based on gated mechanism and word mover's distance. In order to improve the interpretability of user interest modeling methods, this thesis proposed a user interest summarization extraction method based on gated mechanism and word mover's distance for the challenge of automatically recognizing words closely related to user interests, which extracted representative words and sentences to explain characters related to each kind of interest. The proposed method firstly adopted convolutional operation and gated mechanism to integrate background information, which controls the flow of word semantics in the neural network. Then, it used max pooling operation to select important word features, generated semantic representations and predicted interest category, thus the correlation between words and interest categories was established. Furtherly, the most important words identified by max pooling operation were regarded as interest features of each interest category. Finally, the word mover's distance is adopted to measure each sentence's relevance to interest features and the semantic redundancy between sentences. The sentences with high relevance and little semantic redundancy were extracted as interest summarization.
关键词	社会媒体深度学习多任务学习语义信息交互关系
语种	中文
七大方向——子方向分类	推荐系统
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/23798
专题	多模态人工智能系统全国重点实验室_互联网大数据与信息安全
推荐引用方式 GB/T 7714	刘若然. 基于深度学习的用户兴趣画像研究[D]. 北京. 中国科学院大学,2019.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
Thesis.pdf（13345KB）	学位论文		开放获取	CC BY-NC-SA