基于贝叶斯网的不确定知识学习与推理方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于贝叶斯网的不确定知识学习与推理方法研究
	张英华
	2013-05-28
学位类型	工学博士
中文摘要	在不确定环境下进行知识学习与推理是智能行为的基础。互联网的快速发展使得信息的采集、传播速度和规模达到空前的水平，数据量呈现爆炸式增长，人类已经进入大数据时代。有效地分析和挖掘这些量大、快变、多样、结构复杂的数据，获取新的知识并加以推理利用，已成为当今科技界高度关注的研究方向，同时也是支撑数据产业界的关键技术。因此，研究大数据中所蕴含的不确定性知识的有效学习和推理的理论与方法，具有重要的理论意义和实用价值。贝叶斯网络提供了一种表示多变量联系概率分布的方法，同时是一种将概率统计应用于复杂系统领域、进行不确定性知识学习、推理和数据分析的工具。尽管有着坚实的理论基础，然而，由于模型本身的特性，在给定不确定信息下，贝叶斯网的结构学习、精确推理及近似推理问题都被证明是NP难的。本文分别从本质图构建、平凡图三角化、近似推理采样算法等三个方面进行了理论分析，并提出了一系列有效算法。本文主要工作与贡献如下：第一，针对不确定环境下的知识学习问题，提出了一种贝叶斯网本质图结构学习算法（MICGES）。首先采用一种新的不受关系类型影响的统计量 MIC 确定变量间关系，其次多次利用条件独立测试删除多余的边以及对无向边添加方向，构建网络结构作为 GES 的初始结构，最后利用两阶段贪婪搜索算法确定具有最高数据匹配度的最优贝叶斯网络结构。理论上证明了 MICGES 具有多项式时间复杂性。数值实验表明， MICGES 能较快地确定出与数据匹配程度最高的本质图结构，从而能更高效地学习贝叶斯网络结构，与传统GES 算法相较，该算法收敛速度和最优本质图评分都有明显提高。第二，针对联结树推理问题，提出了一种新的基于 BOA 的贝叶斯网三角化算法（BOA_Tri）。首先将贝叶斯网络三角化问题转化为求解Moral 图的最佳结点删除序列问题，其次采用改进的 K2 作为 BOA 算法中贝叶斯网络构建的方法，最后迭代进化得到平凡图的最佳结点序列。改进后的算法的种群进化策略避免了传统遗传算法中交叉算子和变异算子的适应度评价带来的随机性和盲目性，可有效的提高进化搜索的效率。在国际通用的数据集上测试表明，与ECGA 、FDA 以及原始BOA 相比较，BOA_Tri 算法可以准确地获得贝叶斯网Moral 图的三角化网络，显著提高了不确定环境下联结树推理的效率。第三，针对近似推理采样收敛性问题，提出了一种具有通用接受函数形式的MPM 算法（GAF-MPM）。在具有归一化权重函数形式的MPM 算法框架下，提出接受函数分解的约束条件，将接受函数分解为当前信息函数和历史信息函数，给出了9类典型的接受函数，讨论了不同的接收函数选择对样本点接受率和相关性的影响。本文证明了GAF-MPM 算法的收敛性，即满足细致平衡条件（Detailed Balance Condition），由GAF-MPM 算法构造的马氏链收敛于目标分布。数值实验验证了该算法的收敛性和有效性。
英文摘要	Knowledge learning and reasoning with uncertainty is the basis of intelligent behavior. The explosive growth of the amount of data caused by the Internet revolution make us enter the era of big data. The data has become an important resource for various application areas，showing enormous volume, fast changing, diversity, structure complexity. The purpose of analyzing the data is to integrate and effectively present the information associated, is to predict the future. Effective analysis and mining on these large, fast changing and diverse data with complex structures, has become the hottest direction of research in today's technology sector, but also the support of the key technologies of the data industry. Research on the key technologies of uncertainty of knowledge representation and reasoning from the vast amounts of data, has important theoretical significance and practical value. Bayesian network is a tool for uncertain knowledge learning, reasoning and data analysis， which brings probability and statistics into the fields of complex system. Despite its solid theoretical foundation, in the fields of knowledge synthesis and reasoning mechanism with uncertain for Bayesian network ,there are still many problems to be further explored. Due to the characteristics of the model itself, the problems of structure learning and inference are proved to be NP-hard. In this paper, we do theoretical analysis on three aspects: building essential graph, triangulation of the moral graph, sampling algorithm for approximate reasoning, and put forward a series of effective algorithms. The mainly work and contribution of this thesis are as follows: Firstly, a new BN structure-learning algorithm based on Maximal Information Coefficient and conditional independence tests, is proposed. Starting from an graph structure without any edges, our proposed method constructs an undirected graph structure based on measure the dependency between two variables using Maximal Information Coefficient. Next, these edges are oriented direction based on conditional independence test, which generates a draft DAG. Then, we greedily explore the optimal structure in the space of equivalence classes using the essential graph of draft DAG as a seed structure. The algorithm was tested on four standard network structures. The experimental results explain that our method can efficiently and accurately identify BN structures from data. Compared with the original algorithm, our proposed alg...
关键词	贝叶斯网 Mic 本质图三角化 Mcmc Beyesian Network Mic Essential Graph Triangulation Mcmc
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6526
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张英华. 基于贝叶斯网的不确定知识学习与推理方法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2013.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20111801462800（3171KB）			暂不开放	CC BY-NC-SA