基于交互关系的网络社区影响力分析方法

CASIA OpenIR > 毕业生 > 博士学位论文

	基于交互关系的网络社区影响力分析方法
	游强
	2016-05-31
学位类型	工学博士
中文摘要	互联网的蓬勃发展使距离不再成为人们认识彼此、交流信息的障碍，基于地域、爱好和理想等多种元素交汇的网络社区逐渐成为人们更加方便地获取信息的手段。虚拟的网络社区延续并大大拓展了实体社区的概念，使得信息流通和传递的速度越来越快，范围也越来越广。影响是伴随着人们交流之间自然而然产生的一种作用，这种作用能够感染、控制或者操作一些人或事，使受作用的对象改变想法或者做出决策，进而使得人群形成某种一致化的行为。社会影响力分析是社会计算研究中非常重要的一部分，其研究的就是社会人群中一致化行为的形成机理和发展规律。传统的社会影响力分析大多从社会心理学、认知学和营销学等学科入手，融合统计调查，取得了一些基本的结论，这些结论在传统的市场营销、广告投放、公共决策等领域显示着巨大的价值。随着近年来虚拟网络社区的快速发展，传统的领域亟待互联网的改造甚至是颠覆，以社交网络崛起为代表的病毒式营销、网络口碑传播等成功案例也越来越多，以网络社区为对象的社会影响力分析越来越成为网络挖掘和社会计算的研究热点和重要方向。本文主要关注网络分析和社会计算，特别集中于网络社区中潜在影响力的发现和影响传播规律的分析。网络社区的影响伴随着社交用户交互关系而产生，交互的媒介有很多种，本文以网络社区中的文本交互为例，对于社交网络中交互数据的高度碎片化、噪声大、语义不全等问题。不同于网络挖掘和知识工程中构建领域知识库的传统做法，本文提倡从数据中学习，充分认识文本数据的多样性，并结合网络社区自身的结构特点，重点研究了网络社区中用户交互文本的结构和内容的融合策略，试图回答网络社区中基于文本交互的用户之间是如何影响，以及怎样影响的问题。另一方面，本文对于社会网络的最基本的二元关系网络中的信任预测问题也提出了一种简单而有效的策略，策略的原型来源于社会学的一些结论，信任问题是影响传播的基石，同样在解决网络社区用户之间是如何影响以及怎样影响这一基本问题。在本文的最后，提出了一种可人工干预的影响力最大化模型，并分析了其对影响力传播的作用，试图回答影响力传播广度的问题。具体说来，论文的主要工作如下： 1）提出了一种基于时间关联排序的统一融合框架并将其运用于网络社区中交互文本帖子的影响排序。该框架的目的是将交互文本的语义信息融合进网络社区的结构中，其基本思路是建立交互文本的不同语义尺度的表示，并根据网络社区中领域知识的一致性，学习出不同语义尺度下的语义质量，在不同语义尺度下根据语义相似性重建出语义树，相应的提取时间关联模型下的排序值；然后通过语义质量融合不同尺度下的时间关联排序值。如果网络结构是显性给出的，那么语义和结构的融合也可以在时间关联的排序模型下进行。 2）提出了一种基于社交元数据聚类的多任务学习方法并将其运用于网络社区交互文本帖子的影响预测。该方法的目的是将网络社区的结构信息作为交互文本的上下文，通过对社交元数据的建模和划分，自然而然地形成聚类的多任务，通过先分后学的策略，结合聚类多任务学习算法处理网络社区数据学习方法所面临的两大问题，要么是网络社区数据所在的空间维度过高，统一学习的代价太大，容易造成过拟合；要么由于学习的子任务过多造成每一个任务的学习样本不足，彼此之间又缺乏联系造成的欠学习问题。其根本原因还是由于网络交互文本的碎片化、语义不全等问题。本文试图在二者之前找到一个折中的策略。 3）提出了一种二元社交网络关系中信任预测的方法。该方法引入了社会学中人们交往的一些基本逻辑，旨在解决影响传播过程中的信任预测这一基本问题。该方法将社交推荐中的矩阵分解方法应用到信任预测中，在影响的建模过程中综合考虑三方面的因素，一是自己的社交偏好，二是与自己邻接的用户的社交偏好，最后我们还对邻接的用户的邻接节点的偏好也考虑在内，将这三方面的因素融入基于社交网络的矩阵分解中，取得了较为满意的效果。 4）提出了可人工干预网络影响力的最大化问题能并给出了基本的近似求解算法。传统的影响力最大化问题不考虑网络结构的改变，通过对候选节点的选择和依赖一定的传播模型近似处理网络影响力传播的最大化问题，而实际的网络处于不断变动之中，我们根据现实中可干预网络结构变化的情况下，扩展了传统的影响力最大化问题，并给出了一些基本的近似求解算法。
英文摘要	With the Internet growing prosperously, the long distance no longer hinders the people's communication and information sharing. As a result, many kinds of web communities based on different regions, interests and ideas are booming and play a more and more significant role in effective information diffusion. The "virtual" web communities largely extend the concept of offline communities, which make the information flow much faster and wider than before. In communication and interaction between people in a community, a kind of force that is named social influences naturally generates, which affects, controls or operates somebody or something. The object under the force may change his mind or make a decision to behave similarly to others in the same community. Finally, the people in the whole community are consistent with behavioral tendency or emotional experience. Social influence analysis is an important research area in the principle of social computing, which study the mechanism of formation and rule of development for this consistent behavior among people in the same community. The traditional methods for social influence analysis oriented to the offline communities largely come from social subjects such as social psychology, social cognition or marketing, which use questionnaire to collect the essential data sets, and some basic statistical algorithms (e.g. regression or factor analysis) to analyze the data sets, yet some basic conclusions are drawn. The conclusions from the investigation using traditional methods have shown a great value in marketing, advertising, public policy and many other areas in the past decades. As the virtual web communities are quickly developed in recent years, there are increasing successful business cases in social marketing, word-of-mouth advertising with respect to the large online social networks (e.g. Facebook, Twitter, Weibo). Thus, the social influence analysis for online web communities increasingly becomes a significant research direction in web mining and social computing. This thesis mainly focus on the area of large-scale social data processing and social computing, especially discovering the principles of social influence and its diffusion in web communities. The influence comes out as the web users interact with each other in a web community, where there are many kinds of interaction Medias. The thesis mostly concentrates the text-based interaction which is still the most convenient and effective interaction media on the web. Given that the interaction data in social networks almost all exist the following problems: highly-fragmented, noisy or lack of background knowledge, different from the traditional method in knowledge engineering which should create a background domain knowledge base with much difficulty, we advocate learning from data, fully recognizing the diversity of the social-interaction text data. Given the special structure of the web community, we emphasize the fusion strategies between the text-based interaction from the web users and the structure of the web community, which tries to answer the questions that how the users are influenced by the text-based interaction, and who are influenced by the others. Besides, the thesis introduces a simple yet effective strategy to solve the trust prediction in signed social networks with friend and foe relationships. The strategy introduces here motivated by the previous conclusions from social psychology. The trust prediction problem is the cornerstone of influence diffusion, same as the previous fusion work, which answer the questions that how the users are influenced and who are influenced. At the end, the thesis introduces an influence maximization problem in human-intervened social networks, and analyzes the influence diffusion in constraint conditions, which is to answer the question that how wide the influence can diffuse given the limited budget and time. The main contributions of this thesis are summarized as follows: (1) A unified fusion framework for the time-related rank model is proposed to handle the post rank with respect to its influence in the web community. The aim of the framework is to fuse the semantic information of the text-based interaction into the special structure of the community. The basic ideas are carried out as follows. First, the interaction text are represented in different semantic scales. With the assumption of the consistency of the domain knowledge in a web community, different semantic quality measures are learned. Accordingly, different semantic trees are reconstructed based on the semantic similarity measures in different semantic scales. The time-related rank values of the nodes in different semantic trees can then be easily calculated. Second, the rank values are fused based on the semantic quality measures learned previously. If the community reply-to structure is explicitly given, the semantics and structure can also be fused in the same unified fusion framework. (2) A metadata based clustered multi-task learning method is proposed to conduct to the influence prediction problem in web communities with text-based interactions. The purpose of the method is to take the reply-to structure of web communities as the web context for the text-based interactions. With the social metadata splitting in different clusters, the multiple learning tasks are naturally created. By the first dividing then learning strategy, the clustered multi-task learning method can solve the two commonly existing problems in handling the web data. The one problem is that the dimension of the web data is too high, which easily causes the over-fitting problem if we want to learn a unified model. The other problem is that the learning tasks are too much if we handle them as different independent learning tasks, which causes the under-fitting problem because the samples in each task are inefficient. In one word, the substantive problem is that the interaction data in web communities are the highly fragmented and lack of semantics. As a result, A compromise is found to conduct the ``divide and learn'' strategy based on the social metadata and fuse the web context into the multi-task learning for the interaction texts. (3) A trust prediction from social psychology in signed social networks with friend and foe relationships is introduced here to handle the basic problem in influence diffusion procedure. Combining the basic conclusions from social psychology, the method is motivated by the matrix factorization in social recommendation. In the influence modeling, three factors are considered, the social preference of ourselves, the social preferences of our neighbors and finally the social preferences of the neighbors of our neighbors. A satisfactory result is reached in the trust prediction by matrix factorization considering the three kinds of social preferences. (4) An influence maximization problem in human-intervened social networks and two basic approximated solutions are proposed. The traditional influence maximization problems are only considered the static social networks. Given the static social network, the influence diffuses with respect to the given budget and time following some influence diffusion models. Actually, the social networks are dynamically changed. Many social networks are operated according to the business interests by the service provider, which is what we considered in our work. As for the human-intervened social networks, some basic approximated algorithms are proposed for the influence maximization problem.
关键词	社会影响力文本交互融合算法信任（关系）预测影响力最大
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/11782
专题	毕业生_博士学位论文
作者单位	中国科学院自动化研究所模式识别国家重点实验室
第一作者单位	模式识别国家重点实验室
推荐引用方式 GB/T 7714	游强. 基于交互关系的网络社区影响力分析方法[D]. 北京. 中国科学院大学,2016.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于交互关系的网络社区影响力分析方法.p（2533KB）	学位论文		限制开放	CC BY-NC-SA