CASIA OpenIR  > 毕业生  > 博士学位论文
基于多特征概率图模型的视觉人体行为分析
杨双
学位类型工学博士
导师胡卫明
2016-05
学位授予单位中国科学院研究生院
学位授予地点北京
关键词多特征 概率图模型 最大间隔 主题模型 最大熵判别分析 高斯 过程 多任务学习 多核学习 正则贝叶斯 行为识别
摘要
视觉人体行为分析是计算机视觉领域中的一个重要研究方向,它是通过计
算机来实现对未知视频序列中的人体行为的自动分析,实现对视频中人体行为
的行为类别预测、发生位置判断等目标。该问题不仅在智能监控、人机交互、
医疗体育等领域有着巨大的应用前景,同时也具有重要的理论价值,对于促进
相关研究领域的发展,如模式识别、机器学习等学科,都具有重要意义。
通常,视觉人体行为分析方法主要包括两个步骤:(1)构建表示模块,对视
频中的人体行为进行表达;(2)构建识别模块,结合前一步中得到的视频表达来
完成对视频内容的分析与人体行为的判断。由于视频中包含有丰富的背景信息,
并且不同人的行为方式也各不相同,因此同时利用多种不同特征来联合构建视
频表达并进而完成识别的方法逐渐显示出其优势。虽然利用不同的特征种类可
以有效地刻画视频中人体行为的不同侧面的特性,从而可以得到较为全面和鲁
棒的视频描述。但是,视频的底层特征在多数情况下只单纯的记录了视频中像
素级的变化信息,无法表达视频中的中高层语义性信息,因此并不足以很好地
区分视频中的目标行为信息与其他干扰信息。为此,本文一方面通过采用互补
的多种特征对视频中的丰富信息进行刻画,同时研究利用概率分布的统计特性,
来减小视频中干扰因素的影响,进而提高方法的鲁棒性。论文的主要工作和贡
献如下:
(1) 提出一种嵌入多重随机性的表观信息与运动信息的融合算法。我们利用
不同特征间的互补与冗余特性,构建随机特征子空间,形成对视频的互补
性表示。同时,不同于传统方法中只利用特征本身的相似性进行识别,我
们提出综合利用特征的时空结构信息及其概率分布的识别方法。在此过程
中,一方面利用每个子空间中特征本身的相似性构建随机弱线性分类器,
另一方面利用特征间的时空结构关系及样本的统计分布来完成对于样本类
别的最终判别。最后,我们利用随机森林的框架来进行实现,并在多个具
有不同复杂度的公开数据库上进行实验,验证了该方法的有效性与鲁棒
性。
(2) 提出一种融合多特征及其上下文信息的分层贝叶斯模型,并成功应用于
人体行为识别。我们构建含有多组双层主题结构的分层概率图模型来挖
掘与提取视频中不同方面与不同尺度的主题行为模式,并通过高层行为模式的概率分布来表达视频中的行为。具体来说,我们一方面利用特征的
结构信息来分别构建区域级的局部行为模式与视频整体级的全局行为模
式;另一方面,结合特征本身的相似性来进一步约束不同的行为模式,从
而达到从视频底层特征,到局部区域行为模式,再到视频整体行为模式的
提炼,进而完成对视频内容的语义挖掘与描述。最后,我们利用不同分布
间的共轭特性,积分消去部分变量,推断出高效的CGS ( Collapsed Gibbs
Sampling ) 算法来完成模型的学习与推断,进而完成对视频中人体行为的
识别。
(3) 提出一种嵌入最大间隔机制的多特征分层贝叶斯模型并应用于行为识别。
该方法将基于多特征的分层概率图模型的表示模块与基于最大间隔准则的
识别模块通过最大熵判别分析的方法融合在一个统一的贝叶斯框架下,实
现两个模块的联合学习与推断。另外,我们结合多任务学习机制,进一步
实现多特征多类别的视觉人体行为识别。与通常将视频表示部分与识别部
分分别独立学习的传统方法相比,在我们的模型中,由于表示模块与识别
模块在一个统一的框架下联合学习,因此可以使两个模块相互促进实时校
正,进而具有更强的表达力和判别力。最后,通过在多个流行的公开数据
库上进行的多方面、多角度的对比实验,验证了该方法中各个模块及整体
模型的有效性与性能的稳定性。
(4) 提出一种基于高斯过程与多核学习的多特征分层贝叶斯模型,并应用于
行为识别。区别于传统方法中假设数据以某种形式可分而假设判别函数具
有某种固定的参数形式,我们引入基于高斯过程的非参数方法,使我们的
模型能够对任意形式的判别函数进行建模,从而不局限于线性可分的情
形,使模型具有更广泛的适用性。同时,我们引入最大间隔机制以最小化
分类的期望损失,进一步提高模型的判别性。最后,我们将高斯过程分类
与最大间隔准则统一起来,形成一个统一的贝叶斯后验求解问题进行学习
与推断。与传统的对每种特征单独处理的方法相比,我们从多个角度实现
了特征间的融合,同时通过基于高斯过程的非参数方法与判别性准则的结
合,进一步引入提高了模型的判别力和鲁棒性。最后,我们在公开的视频
行为数据库上验证了我们方法的有效性。
其他摘要
Human action recognition is an outstanding branch of computer vision. It
aims to take advantage of computers to automatically analysis and recognize human actions in videos, and to carry out some tasks such as classi cation and
localization. Human action recognition plays an important role in many application areas, such as smart surveillance, human-computer interface, medical and
sports elds and so on. Besides, it is also signi cant to boost the development of
related research elds, like pattern recognition, machine learning and so on.
The common process for human action recognition includes two steps: (1)
designing proper representation methods to describe the contents in videos and
(2) building a classi cation module based on the obtained representations. Due
to the various types of information in videos and a wide diversity of human action
styles, it becomes more and more prevalent to jointly employ multiple features
to generate the representations for action recognition. By using di erent feature
modalities to simultaneously characterize di erent aspects of human actions, including the appearance and the dynamic property, we can get a comprehensive
and robust description for the human actions in videos. However, the low-level
scattered features usually focus on describing the movements in the pixel-level,
which are not robust and semantic enough to distinguish the target actions from
other noisy movements in cluttered backgrounds. To tackle this problem, this
thesis, on the one hand, studies to represent videos by fusing multiple complementary features to describe the rich information in videos; and on the other
hand, we take advantage of the statistic property of probabilistic distributions to
reduce the interference of noise and so improve the model's robustness. The main
contributions of this thesis are summarized as follows:
(1) We propose to fuse two complementary types of features, appearance and
dynamic features, by embedding multiple randomness for human action
recognition. In this method, we take advantage of the complementary and
redundancy property between di erent features to construct multiple randomized subspaces for action representation. In the meantime, we propose
to jointly use the spatial temporal structures of features together with theprobabilistic distributions of samples in each class for recognition. In the
recognition process, we employ the similarity between features themselves to
construct randomized weak linear classi ers together with the spatial temporal structures of features and the probabilistic distributions of samples to
nish the nal prediction. In the end, we implement our idea based on the
random forest framework and perform testing on multiple public datasets,
which proves the e ectiveness and robustness of our method.
(2) A hierarchical Bayesian model by fusing multiple features together with
their context information is proposed for human action recognition. We
build a hierarchical Bayesian model with multiple bi-layer topic structures
in multiple groups to capture the latent action patterns in videos. Actions in videos are represented by the discriminative distributions over the
high-level action patterns. Speci cally, we take advantage of the structures
of features to model the region-level local patterns and employ these local
patterns to further model the video-level global patterns. From this hierarchical structure, we can obtain high-level semantic action patterns in
di erent scales and di erent aspects to get robust representations for action
recognition. Finally, we make use of the conjugacy property between di erent probabilistic distributions and derive an ecient CGS (Collapsed Gibbs
Sampling) algorithm to implement the learning and inference for action
recognition.
(3) A multi-feature max-margin hierarchical Bayesian model is proposed for
human action recognition. The model combines two modules, the representation module based on the multi-feature hierarchical Bayesian model
and the recognition module based on the max-margin principle, in a uni ed
framework for action recognition. Di erent with many traditional methods
which perform representation and classi cation in two separate steps, we
make use of the maximum entropy discriminant analysis to fuse the two
modules in a uni ed Bayesian framework and perform a joint learning and
inference process for the two modules. Therefore, the two modules are able
to adjust and improve each other in the learning process to make the whole
model much more descriptive and discriminative. In addition, we introduce
the multi-task learning to process multi-feature multi-class recognition. Inthe end, we test on multiple popular action datasets and compare with many
other related methods over multiple aspects, which proves the e ectiveness
and the stableness of our method.
(4) A multi-feature hierarchical Bayesian model based on Gaussian process and
multiple kernel learning is proposed for human action recognition. We make
use of the non-parametric method based on Gaussian process to make our
method able to model any discriminant function in any form, not limited to linear form for example, which makes our method applicable in a
wider range. In the meantime, we introduce the max-margin principle and
minimize the expected loss function to improve the discriminative power of
our model. Finally, we combine the Gaussian process and the max-margin
principle in a uni ed Bayesian framework and transform the problem to a
Bayesian posterior solving problem. Moreover, we introduce multiple kernel
learning method to fuse multiple features in di erent aspects. Compared
with the traditional methods which separately process each type of features, our method can fuse di erent features in a more compact way. And
the introduction of non-parametric method based on Gaussian process and
discriminative principles makes a further improvement of the model's discrimination and robustness. Finally, we test on public benchmark datasets
and show the e ectiveness of our method for action recognition.
文献类型学位论文
条目标识符http://ir.ia.ac.cn/handle/173211/11835
专题毕业生_博士学位论文
推荐引用方式
GB/T 7714
杨双. 基于多特征概率图模型的视觉人体行为分析[D]. 北京. 中国科学院研究生院,2016.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
ys_thesis_最终提交版.pdf(4149KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[杨双]的文章
百度学术
百度学术中相似的文章
[杨双]的文章
必应学术
必应学术中相似的文章
[杨双]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。