基于社交媒体行为的用户人口统计属性推断

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于社交媒体行为的用户人口统计属性推断
	项连城
	2017-05-31
学位类型	工学硕士
英文摘要	随着移动互联网的发展和社交媒体网络的应用普及，越来越多的人使用多种多样的在线社交媒体服务，产生了海量的社交媒体内容信息并使得用户面临信息过载的问题。为此，如何更好地分析和理解用户，为用户提供个性化的信息服务，成为社交媒体的主要任务和挑战。用户人口统计属性，包括年龄、性别、婚姻状况和职业等，是理解和进行用户画像的基础。用户在社交媒体网络中产生的海量多媒体内容数据与丰富的用户行为信息，隐含地揭示了关于用户个人信息的重要线索，为解决社交网络中用户人口统计属性的缺失与稀疏问题提供了解决途径。基于此，本文重点研究如何利用用户的社交媒体行为来进行用户人口统计属性的推断，具体从用户人口统计属性的关联性和稳定性这两个特点出发，进行了如下三个方面的研究工作： 1.提出了基于超图学习的关联性用户人口统计属性推断方法。用户不同的人口统计属性之间存在着关联。在用户社交媒体行为的基础上，合理地利用用户已知的人口统计属性及其关联性，可以有效地帮助进行未知人口统计属性的推断。在超图中，将顶点表示为社交媒体网络中的用户，将超边表示为用户产生内容的相似性和属性之间的关系。利用超图模型，将用户属性挖掘形式化为一个正则化的标签相似性传播问题，可以有效推断用户的人口统计属性。 2.提出了基于对偶投影矩阵的跨社交媒体网络用户人口统计属性推断方法，解决了动态的社交媒体行为和相对稳定的人口统计属性之间的矛盾。基于存在唯一且稳定的人口统计属性导致用户在不同社交媒体网络中表现出不同的动态行为的假设，将用户在不同的社交媒体网络中的行为特征统一地投影到同一个空间中进行用户人口统计属性推断。在Google+和Twitter的真实数据集上的实验验证了提出方法的有效性。 3.提出了基于多源自编码器的跨社交媒体网络用户人口统计属性推断方法。基于用户人口统计属性的稳定性，寻找到用户在不同社交媒体网络中的共享行为模式，解决相对稳定的人口统计属性与动态的社交媒体行为之间矛盾，并处理用户标记数据难以获取的问题。该方法采用分层学习模型，利用更多社交媒体网络中无人口统计属性标记用户的行为数据寻找用户的共享行为模式，得到稳定的用户特征表达，再对有人口统计属性标记的用户进行用户人口统计属性推断的研究。该方法充分地利用大量无标记用户数据，找到不同社交媒体网络的共享行为模式，有效地提高用户人口统计属性推断的准确率。 ; With the development of mobile Internet and the popularization of social media network, more and more people use a variety of online social media services, resulting in massive social media content which makes users face the problem of information overload. To this end, how to better analyze and understand the users and to provide them with personalized information services have become the main task and challenge of social media research. User demographic attributes, including age, gender, marital status and occupation, are the basis for understanding and conducting user profiling. The massive user-generated multimedia content in the social media network and rich users' behavior information, implicitly reveal the users' personal information, and show important clues to solve the lack and sparseness of user demographic attributes in the social network. Based on above discussion, this paper focuses on how to infer the users' demographic attributes by their social media behaviors. In view of the two characteristics of demographic attributes—relevance and steadiness, the research work of this thesis is carried out in the following three aspects: 1. We propose a method to relationally infer the user attributes via hypergraph learning. There exist dependency relations between the different demographic attributes of the user. Based on the user's social media behaviors, it is effective to use users' known demographic attributes and their relevance to help infer the unknown demographic attributes. In the hypergragh, each vertex represents a user in the social media, and the hyperedges are used to capture the similarity relations of the user generated content and the relations between attributes. The user attributes inference is formalized into a regularized label similar propagation problem in the constructed hypergraph, which can effectively infer the users’various attributes. 2. We propose a coupled projection matrix based cross-OSN approach to infer user demographic attributes, which solves the conflicts between dynamicity of behaviors and the steadiness of demographic attributes. The basic assumption for the proposed approach is that, the same user's cross-OSN behaviors are the reflection of his/her demographic attributes in different scenarios. Based on this, the cross-OSN behaviors are collectively projected onto the same space for demographic attribute inference. Experimental evaluation is conducted on a self-collected Google+ and Twitter dataset, and the results demonstrate the effectiveness of cross-OSN based demographic attribute inference. 3. We propose a cross-OSN method based on multi-source autoencoder for estimating user demographic attributes. Based on the steadiness of user demographic attributes, the method finds the shared behavior pattern of users in different social media networks, resolves the contradiction between relatively stable demographic attribute and dynamic social media behavior, and solves the problem that user labeled data is difficult to obtain. This method which uses the hierarchical learning model finds the user's sharing pattern with the unlabeled users' behaviors in different social media networks to obtain the stable user features, and infers the labeled users' demographic attributes. This method makes full use of a large number of unlabeled users' data to find the shared behavior patterns of different social media networks, and effectively improves the accuracy of user demographic attribute inference.
关键词	社交媒体行为人口统计属性关联性稳定性
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14643
专题	毕业生_硕士学位论文
作者单位	模式识别国家重点实验室(中国科学院自动化研究所),北京 100190
第一作者单位	模式识别国家重点实验室
推荐引用方式 GB/T 7714	项连城. 基于社交媒体行为的用户人口统计属性推断[D]. 北京. 中国科学院研究生院,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
XLC.pdf（4953KB）	学位论文		限制开放	CC BY-NC-SA