基于贝叶斯多核学习的行为识别

CASIA OpenIR > 毕业生 > 硕士学位论文

	基于贝叶斯多核学习的行为识别
	孙雯
	2017-05-24
学位类型	工学硕士
中文摘要	由于最近几年视频数据量的爆炸式增长，视频分析与理解变得越来越重要并且吸引了大量的研究兴趣。在视频分析与理解领域，对于视频中人体行为的识别是一个活跃的研究方向。在众多领域的应用前景激发了行为识别问题的研究，如智能视频监控，人机交互和视频检索以及医疗诊断等。然而由于行为表现的差异，光照变化，相机运动和遮挡等因素，使得这一任务极具挑战。基于视频的人体行为识别的主要任务是让计算机自动的对视频序列中的人体行为进行识别。通常情况下，一个简单的人体行为识别系统的基本工作流程可以描述如下：首先对输入的视频进行特征提取以获得具有较强描述力的特征，再将提取得到的特征转化为行为表示，最后通过机器学习方法对得到的行为表示进行学习分类，进而实现对人体行为的模式识别。之前对行为识别问题的研究将更多的注意力放在了设计对行为识别问题有描述力的特征上，也因而涌现了大量的特征。然而一种类型的特征的表达能力是有限的，不能充分的捕捉视觉特性。对多种特征进行融合，集成多种有信息的特征，是一个在直觉上合理的方法。然而，目前大多数的行为识别方法对多种特征的融合采用简单的方法，不能够衡量每种特征的贡献，也不能保证融合后的效果相比于只使用单个特征会有提升。因此，本文致力于提出一种有效的融合方法，可以衡量每种特征的相对重要性，学习最优的特征的结合，充分利用每种特征的优势。并将其应用于人体行为识别问题，以在实验中获得更好的分类表现。论文的主要工作和贡献可以归纳如下： 1、提出了一种新的建立在分层贝叶斯框架下的基于多核学习方法的特征融合方法，分层贝叶斯多核学习方法。具体的，我们以多核学习方法为理论指导，采用线性加权和作为核结合的函数形式，使用贝叶斯方法对核结合函数形式中的参数加以先验假设，对应建立三层的概率图模型，最后使用变分贝叶斯方法对参数进行推断。我们的融合方法能够自动学习多种特征在结合时的最优权重，进而有信息的将多种特征的互补特性发挥到最大。多组实验表明，该方法能够有效的对多种特征进行结合，获得优于单特征和现有特征融合方法的表现。 2、提出了一种基于特征融合的人体行为识别算法。针对目前人体行为识别算法所使用的融合方法不能衡量每种特征的贡献，从而有效融合的问题，本文将我们提出的基于分层贝叶斯多核学习方法的融合方法应用于人体行为识别问题。我们提取多种特征并计算相应的特征核，这些特征核即对应多核学习方法的基核。为了形成对视频中行为的充分描述，我们提取了传统手工特征与深度学习特征。这些特征在描述视频时是互补的，包括了静态表观信息与动态运动信息，还包括了局部信息与全局信息。我们在多个具有不同复杂度的公开人体行为识别数据库上进行了一系列实验，我们基于特征融合的人体行为识别算法在多个数据集上取得了有竞争力的效果，实验结果证实了我们提出的方法的有效性。 3、通过我们的分层贝叶斯多核学习方法，分析针对不同特点的数据集中的视频行为，不同特征在分类时的贡献大小。虽然对人体行为识别的研究已展开多年，但关于在识别过程中，哪种类型的特征所做的贡献更大以及深度学习特征与传统手工特征的关系鲜有人给出分析。我们的分层贝叶斯多核学习算法除了可以有效的对多种特征进行结合，还可以对每种特征的贡献给出启示。由于不同特征所对应的基核在构成合成核时的权重是完全由数据驱动的，通过对实验中得到的核权重进行分析，我们得到了一些结论，相信可以对人体行为识别问题的研究有一定的借鉴意义。
英文摘要	The analysis and understanding of videos is an area with increasing significance and has attracted much research attention due to the explosion of Web videos over the past few years. The recognition of human actions in videos is an active research area in the field of video analysis and understanding. Action recognition is motivated by the promise of applications in broad domains such as intelligent surveillance, human-computer interaction and video retrieval. However, the task is still challenging due to the variations in action performances, background clutter, illumination changes, camera movements and occlusions. The main task of human action recognition is to let the computer recognize human action in the video sequence automatically. In most circumstances, the basic working process of a simple human action recognition system can be described as follows: extract features from the input video firstly to carry out informative description, then encode the extracted features, and finally do classification using machine learning methods. The previous researches in the literature have paid more attention to designing descriptive features which are specific to action recognition and a large number of features are available now for this task. However, single type of features is not able to capture the visual characteristics sufficiently. It is an intuitive way to integrate diverse types of informative features instead of a single one to improve the recognition performance. However, the existing action recognition algorithms usually employ the simple combination of different features, most of which fail to measure the contributions of different features and may not guarantee the performance improvement over the individual features. Therefore, we aim to propose an effective fusion method in this paper, which can evaluate the relative contributions of different feature representations, learn the optimum combination of multiple features, and make sufficient use of all the features. We finally apply the fusion method to the field of human action recognition to gain enhanced classification performance. The main contributions of this paper can be summarized as follows: We propose a new feature fusion method based on the theory of multiple kernel learning in a hierarchical Bayesian framework, called Hierarchical Bayesian Multiple Kernel Learning (HB-MKL). Specifically, based on the theory of multiple kernel learning, we consider a linear combination of feature kernels, put priors on the combination parameters in a Bayesian manner, and finally perform inference for learning them using variational approximation. Our feature fusion method is able to adaptively evaluate the optimal weights of the base kernels constructed from different features to form a composite kernel, and therefore make sufficient use of the multiple features informatively. Experimental results demonstrate that the integration of diverse features using HB-MKL enhances the performance compared with single feature based approach and other combination methods. We propose a new approach for action recognition based on feature fusion. Most of the existing fusion methods used for action recognition fail to measure the contributions of different features. To solve this problem, we apply our HB-MKL model to effectively fuse diverse types of features for action recognition. We extract multiple features and calculate the corresponding feature kernels, which are used as the base kernels in the multiple kernel learning method. To form a sufficient description of the actions in videos, we extract traditional hand-crafted features and deep-learned features. The features are complementary when describing videos, including static appearance information and motion information, local information and global information. We conduct a set of experiments for better illustration and comparison on several public human action recognition datasets, and the experimental results demonstrate the effectiveness of our method. We analysis the contribution of different features in the classification process when dealing with datasets with different characteristics based on our HB-MKL method. Although the study of human action recognition have been carried out for many years, few analysis are presented about which kind of features contribute more in the process of recognition, and the relationship between deep-learned features and hand-crafted features. Our HB-MKL method can provide some insight on the contributions of different features, in addition to fusing multiple features effectively. Since the learnt kernel weights are completely driven by data, by analyzing the kernel weights obtained in the experiment, we get some conclusions, which can make sense for the study of action recognition.
关键词	人体行为识别多核学习方法特征融合算法
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/14704
专题	毕业生_硕士学位论文
推荐引用方式 GB/T 7714	孙雯. 基于贝叶斯多核学习的行为识别[D]. 北京. 中国科学院研究生院,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
基于贝叶斯多核学习的行为识别.pdf（9925KB）	学位论文		限制开放	CC BY-NC-SA