CASIA OpenIR  > 毕业生  > 博士学位论文
视频中行为分析关键技术研究
Alternative TitleResearch on Key Technologies of Activity Analysis in Video
张天柱
Subtype工学博士
Thesis Advisor卢汉清
2011-05-23
Degree Grantor中国科学院研究生院
Place of Conferral中国科学院自动化研究所
Degree Discipline模式识别与智能系统
Keyword行为识别 多视角学习 半监督学习 图模型 Adaboost Co-em Co-training Action Recognition Multi-view Learning Semi-supervised Learning Graph Model Adaboost Co-em Co-training
Abstract随着互联网的发展和视频设备的普及,电影、体育、新闻、视频监控等领域的视频内容爆炸性地增长。如何能够方便快捷地从海量视频数据中搜索到感兴趣的内容成为了一个亟待解决的问题。为此,视频中行为分析技术引起研究者们高度的重视。这是因为它能够识别以及定位视频中发生的事件,在智能监控、人机交互、虚拟现实和基于内容的视频检索以及医疗诊断等方面有着广泛的应用前景和潜在的经济价值,成为一个热点研究问题。尽管近几年研究者们做了大量的工作, 但是仍然存在以下几个关键的技术问题:  特征提取和融合。目前对于行为的描述存在很多不同的特征,例如静态视觉特征包括形状和表象,还有动态特征包括时空轨迹和运动光流。这些不同的特征都有各自的描述能力,并且互相之间能够补充和增强。而采用一些简单的融合方法很难挖掘不同异质特征的有效性,因此,如何融合不同的特征描述行为是行为分析的一个重要的基本问题。  如何对不同的行为进行建模和相似性度量。我们知道不同的行为有不同的时间持续长度,并且不同的行为可能含有相似的成分。另外,即使是相同的行为,它们也可能存在差异。因此,行为的建模和度量是一个很困难的问题。  如何根据有限的标签样本训练有效的行为模型。由于类内变化大,遮挡和背景复杂,使得行为识别是一个极具挑战性的任务。许多现有的工作都是基于统计学习的方法训练行为模型从而识别各种行为。为了实现较高的识别率,我们需要大量的有标签训练样本来训练好的行为模型。但是, 人工标注大量的训练样本极其耗时并且繁重,而收集大量的无标签样本却是非常容易的事情(如网络)。 因此,如何根据少量的标签数据和大量的无标签数据训练行为模型是一个极其关键的问题。  如何分析交通场景中运动目标的行为。随着城市的发展和摄像头的普及,基于视频分析的智能交通管理系统越来越受到重视,并且变成一个热点研究领域。这种智能交通管理系统能够通过对视频数据处理分析,自动得到行人或者车辆的轨迹和方向等运动模式,从而对一些违反交通规则的异常事件进行自动报警,避免大量人工处理。然而,由于交通场景中运动目标种类较多和运动模式复杂,自动地识别各种目标的行为仍然是一个很有挑战性的问题。 本文中,针对上述几个问题,我们在模式识别、计算机视觉、多媒体、机器学习等技术方面做了以下研究工作: 1) 研究了基于多视角学习的方法融合多种特征。考虑到静态表象信息和动态运动信息这两个视角(特征)对行为的描述各有长短,我们采用一个基于Co-EM的多视角学习框架来代替传统的基于EM的单个视角学习方法。 从而能弥补和增强每个视角的描述能力,使得基于多视角的视频描述能力超过每一个单视角。据我们所知,我们是第一个提出了基于Co-EM多视角学习的行为识别算法,并且获得了很好的结果。 2) 研究了提升例子学习的方法对行为建模和度量。首先,选取一些关键帧(候选例子),并对每个例子采用多示例学习的方法学习基于每个例子的分类器作为相似度度量, 然后通过AdaBoost算法选取最有代表性的例子对行为进行建模。 3) 研究了提...
Other AbstractWith the development of the Internet and the popularity of video equipment, video data in movies, sports, news, video surveillance and other areas is becoming explosive growth. Therefore, it is urgently required to make the unstructured video data accessible and searchable with great ease and flexibility. As we know, activity analysis can help recognize and localize events in video data, and it is currently one of the most active research topics in computer vision. This strong interest is driven by a wide spectrum of promising applications in many areas such as smart surveillance, perceptual interface, virtual reality, content-based video retrieval, and etc. Although researchers have done a lot of work in recent years, there are still several key technical issues:  Feature extraction and fusion. There are various features derived from significantly different modalities, such as static visual cues, e.g. shape and appearance, as well as dynamic cues, e.g. spatial-temporal trajectories and motion field. Such diversity also to what degree they compensate for each other. Because these different kinds of features are heterogeneous, it is difficult to mine their effectiveness just with simple fusion. Therefore, how to fuse different features is an important and basic problem for activity analysis.  How to model different activities and measure their similarities for recognition. As we know, different actions have different time resolutions, and may have some similar elements. In addition, though they are the same actions, they may contain some different elements. Therefore, it is difficult to model and measure activity for recognition.  How to train activity model just based on a few labeled samples. Human action recognition is a challenging task due to significant intra-class variations, occlusion, and background clutter. Most of the existing work uses the action models based on statistic learning algorithms for classification. To achieve good performance on recognition, a large amount of the labeled samples are therefore required to train the sophisticated action models. However, collecting labeled samples is labor-intensive and time-consuming. On the contrary, unlabeled examples are much less expensive and easier to obtain than labeled examples. For example, we can obtain large-scale unlabeled samples from the Internet. Therefore, it is critical to develop algorithms that are able to learn from a small number of labeled examples a...
shelfnumXWLW1584
Other Identifier200818014628077
Language中文
Document Type学位论文
Identifierhttp://ir.ia.ac.cn/handle/173211/6329
Collection毕业生_博士学位论文
Recommended Citation
GB/T 7714
张天柱. 视频中行为分析关键技术研究[D]. 中国科学院自动化研究所. 中国科学院研究生院,2011.
Files in This Item:
File Name/Size DocType Version Access License
CASIA_20081801462807(7984KB) 暂不开放CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张天柱]'s Articles
Baidu academic
Similar articles in Baidu academic
[张天柱]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张天柱]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.