基于语义分析的人体行为识别算法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于语义分析的人体行为识别算法研究
其他题名	Semantic Analysis based Human Action Recognition
	张重
	2014-05-29
学位类型	工学博士
中文摘要	随着互联网的迅猛发展以及移动终端的普及，视频数据量的爆炸式增长给视频数据处理带来了新的挑战。人体行为识别是视频处理的基本问题和关键环节，直接影响到视频内容理解、场景理解等多项研究。同时人体行为识别是计算机视觉和模式识别相关应用领域的关键技术问题，如视频监控、视频检索、人机交互等。近年来，虽然研究人员在人体行为识别方面做了大量的研究，但是语义信息还未得到充分的挖掘。本文将针对人体行为识别问题研究利用语义分析的方法弥补计算机和人脑之间的语义鸿沟。本文的研究工作主要包括以下内容： 1. 针对传统词包模型的缺点，提出基于语义约束线性编码的人体行为识别方法。该方法提出语义距离的概念，使编码过程明确考虑特征点之间的时空关系。同时，该方法利用语义距离最近的几个码字对特征进行线性重构可以有效的减少重构误差。 2. 针对最大间隔聚类方法在聚类过程中将每帧视频图像视为独立特征点并且忽略了同一行为视频相邻帧之间关系的缺点，提出基于语义最大间隔聚类的多视角行为识别方法。在传统最大间隔聚类方法中加入时间正则项得到语义最大间隔聚类，该方法不仅能够获得最大间隔分类的超平面，而且明确地考虑了相邻帧之间的时间信息。 3. 提出一种基于连续虚拟通路的多视角行为识别方法。将虚拟通路上所有虚拟视角串联成一个无穷维特征向量，并将该无穷维特征向量作为最终的分类特征向量。同时，提出虚拟视角核，使其能够测量无穷维特征向量之间的相似度。为了得到更具判别性的虚拟视角核，本文在信息论框架下对其求解。此外，提出一种约束策略来挖掘无标签样本中的视觉信息。 4. 传统的多任务学习忽略了属性对类别的约束，导致不能充分利用属性和类别的语义信息。针对这一缺点，提出一种基于属性正则的人体行为识别方法。本文在多任务学习框架下明确考虑属性对行为类别的约束，使其在学习过程中同时考虑底层特征、属性和行为类别三者之间的关系。大部分多任务学习方法平等地对待每个任务。事实上，由于行为和属性包含不同大小的语义信息量，因此行为类别分类器和属性分类器在本质上是不同的。为了进一步提高分类正确率，本文提出将行为分类器和属性分类器区分对待。在多任务学习框架下，给行为类别分类器和属性分类器加入不同的惩罚。 5. 传统基于相对属性的学习方法对奇异点和零样本学习不鲁棒。为了克服该缺点，本文提出一个鲁棒的学习框架。首先利用Sigmoid函数和Gaussian 函数作为排序支持向量机的损失函数。用这种方式可以减轻奇异点对训练模型的影响，进而提高分类正确率。其次提出一种新的零样本学习策略。具体而言，利用混合高斯模型训练产生式模型，并提出一种新的迁移学习策略。总的说来，本文在人体行为识别，特别是基于语义分析的人体行为识别研究方面做出了有益的研究工作。
英文摘要	With the rapid development of Internet and mobile terminals, the amounts of video data is increasing dramatically. This results in new challenges for processing video data. Human action recognition is the essential issue and key step for video processing, which directly affects the results of other fields, such as video content understanding, scene understanding and so on. Meanwhile, human action recognition is the critical technology for applications of computer vision and pattern recognition, for example video surveillance, video indexing and human-computer interaction. Recently, researchers have conducted huge amounts of works on human action recognition. However, the semantic information is not fully explored. To overcome the drawback, we study the semantic methods to bridge the gap between computers and human. Main contents of this thesis can be summarized as follows: 1. In order to overcome the drawbacks of the traditional bag-of-words, a novel coding strategy called context-constrained linear coding is proposed. This method presents the concept of contextual distance that explicitly considers the spatio-temporal relationship among feature points. In addition, the proposed method utilizes several nearest codewords to linearly reconstruct the local feature, which could alleviate the quantization error. 2. To solve the problem that the traditional maximum margin clustering (MMC) treats each frame independently and neglects the temporal relationship between contiguous frames in the same action video, contextual maximum margin clustering (CMMC) is proposed. The temporal regularization is added to the objective of traditional MMC. The CMMC not only achieves the goal of finding maximum margin hyperplanes, but also explicitly considers the temporal information between contiguous frames. 3. We propose a novel method for cross-view action recognition via a continuous virtual path. All the virtual views on the continuous virtual path are concatenated into an infinite-dimensional feature which acts as the final feature vector for classification. A virtual view kernel is proposed to compute the value of similarity between two infinite-dimensional features. We solve the virtual view kernel under an information theoretic framework that allows maximizing discrimination. Furthermore, we present a constraint strategy to explore the visual information contained in the unlabeled samples. 4. The traditional multi-task learning methods neglect the constraints t...
关键词	人体行为识别语义分析时空关系属性 Human Action Recognition Semantic Analysis Spatio-temporal Relationship Attribute
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6644
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张重. 基于语义分析的人体行为识别算法研究[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20111801462807（13384KB）			暂不开放	CC BY-NC-SA