基于多特征概率图模型的视觉人体行为分析

CASIA OpenIR > 毕业生 > 博士学位论文

	基于多特征概率图模型的视觉人体行为分析
	杨双
	2016-05
学位类型	工学博士
中文摘要	视觉人体行为分析是计算机视觉领域中的一个重要研究方向，它是通过计算机来实现对未知视频序列中的人体行为的自动分析，实现对视频中人体行为的行为类别预测、发生位置判断等目标。该问题不仅在智能监控、人机交互、医疗体育等领域有着巨大的应用前景，同时也具有重要的理论价值，对于促进相关研究领域的发展，如模式识别、机器学习等学科，都具有重要意义。通常，视觉人体行为分析方法主要包括两个步骤：(1)构建表示模块，对视频中的人体行为进行表达；(2)构建识别模块，结合前一步中得到的视频表达来完成对视频内容的分析与人体行为的判断。由于视频中包含有丰富的背景信息，并且不同人的行为方式也各不相同，因此同时利用多种不同特征来联合构建视频表达并进而完成识别的方法逐渐显示出其优势。虽然利用不同的特征种类可以有效地刻画视频中人体行为的不同侧面的特性，从而可以得到较为全面和鲁棒的视频描述。但是，视频的底层特征在多数情况下只单纯的记录了视频中像素级的变化信息，无法表达视频中的中高层语义性信息，因此并不足以很好地区分视频中的目标行为信息与其他干扰信息。为此，本文一方面通过采用互补的多种特征对视频中的丰富信息进行刻画，同时研究利用概率分布的统计特性，来减小视频中干扰因素的影响，进而提高方法的鲁棒性。论文的主要工作和贡献如下： (1) 提出一种嵌入多重随机性的表观信息与运动信息的融合算法。我们利用不同特征间的互补与冗余特性，构建随机特征子空间，形成对视频的互补性表示。同时，不同于传统方法中只利用特征本身的相似性进行识别，我们提出综合利用特征的时空结构信息及其概率分布的识别方法。在此过程中，一方面利用每个子空间中特征本身的相似性构建随机弱线性分类器，另一方面利用特征间的时空结构关系及样本的统计分布来完成对于样本类别的最终判别。最后，我们利用随机森林的框架来进行实现，并在多个具有不同复杂度的公开数据库上进行实验，验证了该方法的有效性与鲁棒性。 (2) 提出一种融合多特征及其上下文信息的分层贝叶斯模型，并成功应用于人体行为识别。我们构建含有多组双层主题结构的分层概率图模型来挖掘与提取视频中不同方面与不同尺度的主题行为模式，并通过高层行为模式的概率分布来表达视频中的行为。具体来说，我们一方面利用特征的结构信息来分别构建区域级的局部行为模式与视频整体级的全局行为模式；另一方面，结合特征本身的相似性来进一步约束不同的行为模式，从而达到从视频底层特征，到局部区域行为模式，再到视频整体行为模式的提炼，进而完成对视频内容的语义挖掘与描述。最后，我们利用不同分布间的共轭特性，积分消去部分变量，推断出高效的CGS ( Collapsed Gibbs Sampling ) 算法来完成模型的学习与推断，进而完成对视频中人体行为的识别。 (3) 提出一种嵌入最大间隔机制的多特征分层贝叶斯模型并应用于行为识别。该方法将基于多特征的分层概率图模型的表示模块与基于最大间隔准则的识别模块通过最大熵判别分析的方法融合在一个统一的贝叶斯框架下，实现两个模块的联合学习与推断。另外，我们结合多任务学习机制，进一步实现多特征多类别的视觉人体行为识别。与通常将视频表示部分与识别部分分别独立学习的传统方法相比，在我们的模型中，由于表示模块与识别模块在一个统一的框架下联合学习，因此可以使两个模块相互促进实时校正，进而具有更强的表达力和判别力。最后，通过在多个流行的公开数据库上进行的多方面、多角度的对比实验，验证了该方法中各个模块及整体模型的有效性与性能的稳定性。 (4) 提出一种基于高斯过程与多核学习的多特征分层贝叶斯模型，并应用于行为识别。区别于传统方法中假设数据以某种形式可分而假设判别函数具有某种固定的参数形式，我们引入基于高斯过程的非参数方法，使我们的模型能够对任意形式的判别函数进行建模，从而不局限于线性可分的情形，使模型具有更广泛的适用性。同时，我们引入最大间隔机制以最小化分类的期望损失，进一步提高模型的判别性。最后，我们将高斯过程分类与最大间隔准则统一起来，形成一个统一的贝叶斯后验求解问题进行学习与推断。与传统的对每种特征单独处理的方法相比，我们从多个角度实现了特征间的融合，同时通过基于高斯过程的非参数方法与判别性准则的结合，进一步引入提高了模型的判别力和鲁棒性。最后，我们在公开的视频行为数据库上验证了我们方法的有效性。
英文摘要	Human action recognition is an outstanding branch of computer vision. It aims to take advantage of computers to automatically analysis and recognize human actions in videos, and to carry out some tasks such as classication and localization. Human action recognition plays an important role in many application areas, such as smart surveillance, human-computer interface, medical and sports elds and so on. Besides, it is also signicant to boost the development of related research elds, like pattern recognition, machine learning and so on. The common process for human action recognition includes two steps: (1) designing proper representation methods to describe the contents in videos and (2) building a classication module based on the obtained representations. Due to the various types of information in videos and a wide diversity of human action styles, it becomes more and more prevalent to jointly employ multiple features to generate the representations for action recognition. By using dierent feature modalities to simultaneously characterize dierent aspects of human actions, including the appearance and the dynamic property, we can get a comprehensive and robust description for the human actions in videos. However, the low-level scattered features usually focus on describing the movements in the pixel-level, which are not robust and semantic enough to distinguish the target actions from other noisy movements in cluttered backgrounds. To tackle this problem, this thesis, on the one hand, studies to represent videos by fusing multiple complementary features to describe the rich information in videos; and on the other hand, we take advantage of the statistic property of probabilistic distributions to reduce the interference of noise and so improve the model's robustness. The main contributions of this thesis are summarized as follows: (1) We propose to fuse two complementary types of features, appearance and dynamic features, by embedding multiple randomness for human action recognition. In this method, we take advantage of the complementary and redundancy property between dierent features to construct multiple randomized subspaces for action representation. In the meantime, we propose to jointly use the spatial temporal structures of features together with theprobabilistic distributions of samples in each class for recognition. In the recognition process, we employ the similarity between features themselves to construct randomized weak linear classiers together with the spatial temporal structures of features and the probabilistic distributions of samples to nish the nal prediction. In the end, we implement our idea based on the random forest framework and perform testing on multiple public datasets, which proves the eectiveness and robustness of our method. (2) A hierarchical Bayesian model by fusing multiple features together with their context information is proposed for human action recognition. We build a hierarchical Bayesian model with multiple bi-layer topic structures in multiple groups to capture the latent action patterns in videos. Actions in videos are represented by the discriminative distributions over the high-level action patterns. Specically, we take advantage of the structures of features to model the region-level local patterns and employ these local patterns to further model the video-level global patterns. From this hierarchical structure, we can obtain high-level semantic action patterns in dierent scales and dierent aspects to get robust representations for action recognition. Finally, we make use of the conjugacy property between dierent probabilistic distributions and derive an ecient CGS (Collapsed Gibbs Sampling) algorithm to implement the learning and inference for action recognition. (3) A multi-feature max-margin hierarchical Bayesian model is proposed for human action recognition. The model combines two modules, the representation module based on the multi-feature hierarchical Bayesian model and the recognition module based on the max-margin principle, in a unied framework for action recognition. Dierent with many traditional methods which perform representation and classication in two separate steps, we make use of the maximum entropy discriminant analysis to fuse the two modules in a unied Bayesian framework and perform a joint learning and inference process for the two modules. Therefore, the two modules are able to adjust and improve each other in the learning process to make the whole model much more descriptive and discriminative. In addition, we introduce the multi-task learning to process multi-feature multi-class recognition. Inthe end, we test on multiple popular action datasets and compare with many other related methods over multiple aspects, which proves the eectiveness and the stableness of our method. (4) A multi-feature hierarchical Bayesian model based on Gaussian process and multiple kernel learning is proposed for human action recognition. We make use of the non-parametric method based on Gaussian process to make our method able to model any discriminant function in any form, not limited to linear form for example, which makes our method applicable in a wider range. In the meantime, we introduce the max-margin principle and minimize the expected loss function to improve the discriminative power of our model. Finally, we combine the Gaussian process and the max-margin principle in a unied Bayesian framework and transform the problem to a Bayesian posterior solving problem. Moreover, we introduce multiple kernel learning method to fuse multiple features in dierent aspects. Compared with the traditional methods which separately process each type of features, our method can fuse dierent features in a more compact way. And the introduction of non-parametric method based on Gaussian process and discriminative principles makes a further improvement of the model's discrimination and robustness. Finally, we test on public benchmark datasets and show the eectiveness of our method for action recognition.
关键词	多特征概率图模型最大间隔主题模型最大熵判别分析高斯过程多任务学习多核学习正则贝叶斯行为识别
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/11835
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	杨双. 基于多特征概率图模型的视觉人体行为分析[D]. 北京. 中国科学院研究生院,2016.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
ys_thesis_最终提交版.pdf（4149KB）	学位论文		限制开放	CC BY-NC-SA