个体和群体的视觉行为理解

CASIA OpenIR > 毕业生 > 博士学位论文

	个体和群体的视觉行为理解
其他题名	Solo and Group Action Recognition
	魏青帝
	2010-12-03
学位类型	工学博士
中文摘要	基于视觉的运动分析是为了使计算机获得智能感知能力，使它们能够从视频里面识别物体，理解目标的行为。随着各种各样的视频数据大量涌现，计算机视觉的应用范围也越来越广，比如视觉监控，人机交互，体育视频解说等等。这就需要计算机可以快速得从海量视频数据中提取有用的信息，并自动分析场景中发生的事件。近几年已经有很多学者关注这个领域，并且做了很多工作。但是仍然有很多充满了困难的问题没有解决。行为理解的本质是建立图像序列的底层特征和高层行为模式之间的联系。本文从特征层面、识别算法到整体框架开展了一系列的工作，研究其中遇到的问题。主要工作有： 1,详细回顾行为理解的研究现状，包括其相关研究、当前主要采用的方法、目前存在的研究难点以及将来可能的研究方向。 2,提出了基于主导集聚类的行为理解算法框架。人是视频序列的主要运动目标，人体在每一帧图像上的形态称之为姿势，姿势序列按照时间连贯起来就表现为不同的行为。本章中，针对单帧姿势的二值人体轮廓，使用Shape Context 作为描述轮廓的特征。Shape Context 具有很好的描述能力和尺度不变性。使用主导集聚类生成视觉字典。主导集聚类的优点是得到的聚类团要比K-means 的结果更加紧密。实验分析表明，我们的算法不仅可以区分“走”，“跑”等相似性很高的不易区分的行为，即使是被噪音干扰，得不到完整轮廓的视频我们的算法同样可以保持90% 的准确率。 3,阐述了基于全局特征统计的群体行为分析算法。基于轮廓的运动特征容易受到衣服等外部因素的影响，而且对于多人之间的群体行为，目标彼此之间经常性的相互遮挡。使得轮廓特征完全不能正常工作。这时基于全局统计的运动特征，比如时空兴趣点、光流直方图可以更好的描述这些行为。基于上述考虑，本章提出了一种基于全局统计特征分析多人群体行为的算法。实验证明，我们的算法不仅在群体行为之间可以识别群体行为的类别，也可以在群体行为内部，识别群体的整体行为。 4,提出了一种方法使用稀疏表示压缩视觉字典。我们首先使用旧视觉字典稀疏表示训练数据，然后在稀疏表示上学习出旧字典每个词的权重，最后使用权重进行字典压缩。我们在Weizmann 数据库上测试了这种算法，得到的实验结果显示：稀疏表示确实可以在保证性能稳定的情况下，压缩K 均值聚类构造的视觉字典。
英文摘要	Action recognition is one of the most active research ¯elds in computer vision. Despite the increasing amount of work done in this ¯eld in recent years, action recognition remains a challenging task for the several reasons, such as the action feature, the high degree of freedom of human body, nuisance factors, etc... The main contributions of our work are summarized as follows: 1, we present a comprehensive survey of works in the past couple of decades to address the problems of representation, recognition and learning of human activities from video and related applications. 2, In this chapter, we proposed a novel method for classifying human actions in a series of image sequences containing certain actions. Human action in image sequences can be recognized by a time-varying contour of human body. We first extracted shape context of each contour to form the feature space. Then the dominant sets approach is used for feature clustering and classification to obtain the labeled sequences. Finally, we used a smoothing algorithm upon the labeled sequences to recognize human actions. The proposed dominant sets-based approach has been tested in comparison to three classical methods: K-means, mean shift, and Fuzzy-Cmean. Experimental results demonstrate that the dominant sets-based approach achieves the best recognition performance. Moreover, our method is robust to non-rigid deformations, significant scale changes, high action irregularities, and low quality video. 3, Group action recognition is a challenging task in computer vision due to the large complexity induced by multiple motion patterns. This paper aims at analyzing group actions in video clips containing several activities. We combine the probability summation framework with the space-time (ST) interest points for this task. First, ST interest points are extracted from video clips to form the feature space. Then we use k-means for feature clustering and build a compact representation, which is then used for group action classification. The proposed approach has been applied to classification tasks including four classes: badminton, tennis, basketball, and soccer videos. The experimental results demonstrate the advantages of the proposed approach. 4, In this chapter, we proposed a method using sparse representation to compress the visual codebook. We first represent the training data using sparse representation of the old visual codebook, and then learn the weight of every word that is used ...
关键词	运动特征提取行为理解群体行为理解视频序列语义理解视觉字典 Movement Feature Extraction And Representation Visual Surveillance Group Action Recognition Visual Codebook Invariant Feature
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6312
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	魏青帝. 个体和群体的视觉行为理解[D]. 中国科学院自动化研究所. 中国科学院研究生院,2010.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20061801462805（13558KB）			暂不开放	CC BY-NC-SA