视频中的人体跟踪和行为识别方法研究

CASIA OpenIR > 毕业生 > 博士学位论文

	视频中的人体跟踪和行为识别方法研究
其他题名	Human Tracking and Action Recognition in Video
	王恒
	2012-05-30
学位类型	工学博士
中文摘要	随着互联网的飞速发展，视频数据的爆炸式增长和内容的多样化给分析、处理视频数据提出了新的挑战。人体跟踪和行为识别是视频分析的核心问题，也是目前计算机视觉和模式识别的研究热点。跟踪属于视觉的中层部分，通过跟踪我们可以获得目标的运动轨迹、姿态参数等信息。这些信息为后续的高层视频内容分析，如行为识别等，奠定了基础。而行为识别是通过融合底层特征和中层跟踪系统达到理解视频中的语义信息的目的。这在大规模视频监控系统和网络视频搜索中都有广泛的应用空间。以前的跟踪算法着重于对目标外观的建模，即如何有效地刻画目标外观随时间的变化。本论文中研究了如何利用背景信息提高目标跟踪的性能，有效处理跟踪中产生的漂移问题。针对行为识别，本论文评测了当前流行的各种时空兴趣点检测子和描述子。同时和跟踪结合起来，考虑如何描述视频中的运动信息。然后讨论了各种特征融合方法，提高了行为识别的性能。本文的主要研究内容和贡献包括： 1. 通过同时给目标和背景建立非参数的概率密度估计模型，有效地融合了背景信息，提高了跟踪系统的鉴别能力。设计了一种检测遮挡(occlusion)和扰乱(distraction)的准则，利用该准测可以调整概率密度模型中样本权重。在准确捕捉目标和背景外观变化的同时，提升了对遮挡和扰乱的鲁棒性。这种在生成式模型中融入鉴别信息的方式提高了跟踪的效果。 2. 通过将每帧图像中的新样本当作未标记样本，把跟踪问题建模为一个半监督学习问题。这种建模方式增强了跟踪系统对概念漂移的鲁棒性。具体来说，我们引入了半监督鉴别分析(SDA)，为该算法设计了一种增量更新方式，使其具备在线学习能力。同时我们针对SDA设计了新的正则项，提出了扩展的半监督鉴别分析(ESDA)。最后我们将SDA和ESDA嵌入到online boosting框架中，通过boosting融合多个分类器，提高了系统的鉴别能力。 3. 时空兴趣点检测子和描述子成为行为识别的主要特征表述方式。与传统的方法相比，时空特征不需要跟踪、分割等预处理，对遮挡，光照，姿态等各种变化更加鲁棒。利用词包(bag-of-features)模型，可以识别复杂视频中的行为，显著提高识别精度。本文对当前主流的4种时空兴趣点检测子和6种时空兴趣点描述子做了系统的评测。同时提出了对视频直接进行稠密采样(dense sampling)的方法。大量的实验结果表明稠密采样在复杂视频数据中要好于稀疏的时空兴趣点检测子。稠密采样能够更加有效的捕捉视频中场景的上下文信息，提高识别性能。 4. 提出了稠密轨迹(dense trajectories)的特征描述方法。与时空兴趣点相比，轨迹能够更好地刻画视频中的运动信息。我们通过在光流场中跟踪特征点的方式提取稠密轨迹。在数量和质量上都要优于由KLT跟踪和SIFT特征匹配生成的轨迹。稠密轨迹能够准确捕捉视频中的快速运动，对镜头边界也很鲁棒。由于其自身的稠密属性，可以有效地覆盖视频中上下文信息。我们还提出了一种新的运动边界直方图描述子。该特征能有效地抑制视频中镜头运动，提高了在复杂视频中的识别精度。在主要行为识别数据库(如: KTH, YouTube, Hollywood2, UCF sports, IXMAS, UI...
英文摘要	With the development of Internet, both the amount and the diversity of video data are increasing dramatically. This brings in new challenges for processing and understanding such video data. Human tracking and action analysis have been active research topics for over three decades in computer vision. Tracking is a middle-level vision problem, which can provide motion trajectories, pose parameters for afterwards high-level task. Human action recognition is such a high-level task, which is built upon low-level features and middle-level tracking modules. The major contributions of this thesis are summarized as follows, 1. We propose a new tracking algorithm which combines object and background information. Object and background appearance models are built simultaneously by non-parametric kernel density estimation. Our major contribution is a novel bidirectional learning framework discriminating the object and the background. It provides a mechanism to detect occlusion and distraction, and performs feature selection making the tracker more robust to outliers. By this learning framework, we are able to embed discriminative information into the generative appearance models. 2. Tracking is formulated as a learning problem of discriminating the object from its nearby background. We propose a novel semi-supervised algorithm for tracking by combining Semi-supervised Discriminant Analysis (SDA) with an online boosting framework. Using the local geometric structure information from the samples, the SDA-based weak classifier is made more robust to outliers. Meanwhile, we design an incremental updating mechanism for SDA so that it can adapt to appearance changes. We further propose an Extended SDA (ESDA) algorithm, which gives better discrimination ability. 3. Local space-time features have recently become a popular video representation for action recognition. We evaluate and compare four different feature detectors and six local feature descriptors using a standard bag-of-features SVM approach. Among interesting observations, we demonstrate that regular sampling of space-time features consistently outperforms all tested space-time interest point detectors for human actions in realistic settings. We also demonstrate a consistent ranking for the majority of methods over different datasets and discuss their advantages and limitations. 4. We propose to describe videos by dense trajectories. Specifically, we sample dense points from each frame and track them based ...
关键词	背景信息半监督鉴别跟踪行为识别稠密轨迹运动边界特征融合 Background Information Semi-supervised Discriminant Tracking Action Recognition Dense Trajectories Motion Boundary Feature Combination
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6453
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王恒. 视频中的人体跟踪和行为识别方法研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2012.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20081801462806（10183KB）			暂不开放	CC BY-NC-SA