基于中层特征和时空上下文的行为识别研究

CASIA OpenIR > 毕业生 > 博士学位论文

	基于中层特征和时空上下文的行为识别研究
其他题名	Mid-level Features and Spatial-Temporal Context Based Activity Recognition
	袁飞
	2012-12-10
学位类型	工学博士
中文摘要	视频序列中的行为分析与识别是模式识别和计算机视觉领域中一个重要的前沿研究方向。这方面的研究和进步有助于构建一个智能化的系统和网络，例如智能机器人、智能视频监控系统、海量视觉数据的物联网网络等。行为识别是指让计算机从摄像机记录的视频数据中自动识别出人们感兴趣的行为事件。它涉及模式识别和计算机视觉领域中两个根本性的问题：(\romannumeral1). 行为数据的视觉描述，以及 (\romannumeral2). 行为模式的时空建模与学习。前者是模式识别领域中的本质性的问题：即行为的模式究竟是什么？以及如何从视频数据中提取出有效的行为模式？后者与行为数据的结构属性和动态属性相关，它要解决的关键问题是如何从复杂的行为数据中学习出判别性的行为类模型。近年来，许多研究人员在行为分析与识别方面做了大量的工作。代表性的工作为局部时空兴趣点特征（e.g. STIPs，Cuboids~特征）以及基于词袋模型（Bag-of-Features）的行为描述。局部时空特征能够在特征提取阶段避免一些预处理操作，如背景提取，身体建模以及运动估计等，并且对摄像机运动和光照变化具有一定的鲁棒性。它们还可以构成行为的稀疏描述（如利用词袋模型），有效地嵌入到高级的机器学习框架中，如~SVM。因此，被广泛地应用于行为识别中，并在一些人工和真实场景取得了较好的识别结果。但是，上述方法也存在两个严重的问题： (\romannumeral1). 局部时空特征仅仅描述有限区域的局部信息，与包含不同语义层次的复杂行为类别之间存在较大的语义鸿沟； (\romannumeral2). 基于局部时空特征的描述，如词袋模型，通常丢弃了特征之间空间上、时间上的依存关系。而这种时空上下文的依存关系为行为识别提供了非常重要的线索，是不容忽视的。本文针对上述问题，进行了深入的研究和探索，做了以下几个研究工作。首先，在行为数据的视觉描述方面，本文提出了~2~种中层时空特征： \begin{itemize} \item[1.] 提出了一种基于中层行为部件的行为特征。行为部件特征是一种中层的特征，其设计目的在于克服局部时空特征描述能力不足的问题。本文将行为部件特征定义为空域上具有外观一致性、时域上具有运动一致性的时空部件，它能够描述具有一定语义属性的中层子行为事件，诸如“踢腿”、“挥手”等。我们采用自下而上的策略，从底层特征开始逐层聚类、提取出更高层次的特征：首先从每帧视频图像中提取关键点特征；然后通过跟踪相邻帧之间的关键点特征以得到一系列运动轨迹特征；最后根据运动轨迹的在表观和运动上的相似性，将这些运动轨迹特征聚到不同的聚类中心。我们将每个运动轨迹聚类作为一个中层的行为部件特征，用来描述具有结构一致性和时间一致性的时空部件。此外，我们分别提出了一个表观描述子、一个形状描述子和一个运动描述子，以描述行为部件特征在表观、形状、运动方面的信息。与其他方法相比，行为部件特征具有如下不同和优势： (\romannumeral1). 与局部时空特征（e.g. STIPs，Cuboids~特征）相比，行为部件特征具有更强的判别力，它不仅能够描述身体部件的...
英文摘要	Activity analysis and recognition is an important area of active Pattern Recognition and Computer Vision research. Advances in this field of research contribute to the elaboration of intelligent systems and networks such as, but not limited to, autonomous robots, intelligent video surveillance system, the internet of things with massive visual data. The goal of activity recognition is to automatically analyze and recognize ongoing interested activities from an unknown video. It is involved into two fundamental issues in Pattern Recognition and Computer Vision research: (\romannumeral1). the visual representation of activity data, and (\romannumeral2). the spatio-temporal modeling and learning of activity patterns. The former is one of essential questions in Pattern Recognition area, that is, what is the pattern of activity? and how to extract effective activity pattern from a video? The latter is related to the structural property and dynamic property of activity data, and it is targeted to solve the key problem of learning discriminative activity models from complicated activity data. Over the last decade, a large panoply of work are dedicated to activity analysis and recognition. The representative work is local spatio-temporal interest points~(e.g. STIPs, Cuboids features) and Bag-of-Features based activity representation. They form sparse and effective action representations usually coupled with machine learning techniques, such as SVM. Their success is also due to their avoidance of pre-processing (such as background subtraction, body modeling and motion estimation) and their robustness to camera motion and illumination changes. Impressive results have indeed been reported in both ems: (\romannumeral1). local spatio-temporal features describe only the local information in a spatio-temporal volume. There is a big semantic gap between these local features and complicated activity class with different levels of semantics; (\romannumeral2). local spatio-temporal features based representation and their variants (e.g. synthetic and realistic scenarios. However, the limitation of them lies in: there are two serious probl, bag-of-features) usually discards the geometric and the temporal relationships. This context relationships affords an important cue for activity recognition, and can not be ignored. In this thesis, we deals with the above issues with the following works and contributions. First, for the visual representation of activity data, we prop...
关键词	中层行为部件特征时空特征流时空关系时空上下文内核 Mid-level Activity Components Spatio-temporal String Features Spatio-temporal Relationships Spatio-temporal Context Kernel
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6496
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	袁飞. 基于中层特征和时空上下文的行为识别研究[D]. 中国科学院自动化研究所. 中国科学院大学,2012.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20081801462807（3212KB）			暂不开放	CC BY-NC-SA