面向智能监控的视觉场景行为理解与摘要技术研究

CASIA OpenIR > 毕业生 > 博士学位论文

	面向智能监控的视觉场景行为理解与摘要技术研究
其他题名	Research on Visual Scene Activity Understanding and Summarization for Intelligent Surveillance
	付伟
	2014-05-27
学位类型	工学博士
中文摘要	随着智慧城市的建设和物联网技术的发展，智能视觉监控系统受到越来越多的重视与应用。同时，监控系统向网络互联、高清化和超高清化的发展趋势进一步加强，使得视频数据量呈指数级上升，而其有效利用率却在降低。如何对海量的视频监控数据进行快速有效的存储、浏览和内容分析成为亟待解决的问题。视觉场景的行为理解和摘要技术能够自动对视频中的运动目标进行检测、跟踪、分类和识别，对用户感兴趣的行为或事件进行快速浏览和精准定位，是实现海量视频智能化管理的重要手段。本文针对这两个关键问题进行了深入的探讨，开展了以下几方面的工作： 1、提出了一种基于上下文学习的群体行为分类方法。群体行为由其内部成员（个体）所共享，并可能与其他群体间存在交互。本文方法引入行为的上下文信息，用势函数分别描述个体行为与表观特征的关系、同一群体内部个体行为的关系、不同群体行为之间的关系，并将其融入统一的结构支持向量机框架进行学习和推断，对场景中的群体行为进行分类。通过挖掘行为参与者丰富的上下文信息，该方法显著提高了群体行为分类的准确度。 2、以双层稀疏主题模型为理论框架，提出了一种无监督的视觉场景理解方法。该方法利用主题模型对视频进行建模：时间上不重叠的视频片段对应文档；底层的视觉特征量化为文档中的码字；行为模式对应于隐含的主题。从而将行为模式的挖掘转化为主题模型的主题学习问题。该主题模型具有双层稀疏特性：文档级稀疏和主题级稀疏。此双层稀疏性共同保证了行为模式挖掘任务中行为的语义性，为后续场景的语义建模奠定了基础。另一方面，考虑到监控视频数据分布的不均衡性，即发生正常事件的概率往往大于异常事件的概率，引入一类学习问题，提出了基于判别式稀疏主题模型的异常事件检测方法。实验结果证明了本文方法处理行为模式挖掘和异常事件检测任务的有效性。 3、提出了一种基于叙事图的视频重组技术。叙事图将特定的感兴趣目标的行为在一张静态图上进行展示。该技术首先从视频中分割并提取出感兴趣目标的所有前景，组成一个时空序列。然后，根据三个准则对此目标序列进行采样：目标沿轨迹分布尽量均匀；目标的分布尽量反映目标的表观信息变化；目标的分布尽量反映目标的运动信息变化。通过对每个准则分别定义能量损失函数，将目标序列的采样过程转化成一个能量最小化的问题。最后采用启发式的搜索算法，快速地从时空序列中选择最具表现力的目标，并将其融合进一张背景图像。通过这一技术，视频被分解为一系列叙事图，为特定事件的快速浏览提供了便利。 4、提出了一种运动结构保持的视频摘要方法。在运动目标序列重排列的过程中为避免原始视频不同目标间的交互信息经过摘要后丢失，引入了一种不同目标间运动交互信息的度量方法，并将其作为能量约束项加入整体的优化目标函数中。此外，针对传统技术离线处理时内存占用较大和优化时间较长的弊端，采取在线层次优化的策略每次寻找一个局部最优解，以此来满足实时的要求。实验和对比结果证明了该方法的有效性和实用性。
英文摘要	With the construction of smart city and the development of Internet of Things (IOT) technology, intelligent video surveillance systems have received increasing attention and application. Meanwhile, as the trend of development has been towards internetworking, high definition and ultra-high definition, the video data is increasing exponentially but with a low effective utilization rate. Video storage, browsing, and content analysis become urgent issues in academia and industry. Visual scene activity analysis and video summarization technology are two important means to achieve massive video intelligent management, which allow us to quickly browse and precisely locate events occurring in videos. Aiming at these two issues, we have studied the following topics: 1. We propose a context learning based method for collective activity classification. A collective activity is shared by all individuals presented in the group, and maybe also interacts with other groups. Potential functions are introduced to describe the compatibility between individual action and appearance feature, the relationship between individuals within the same group, and also the interaction among different groups. A discriminative structure SVM model is proposed to jointly learn these contextual information in a unified framework. Experimental results have demonstrated the superiority of our approach in collective activity classification. 2. Based on the theory of bi-layer sparse topic model (BiSTM), we propose an unsupervised approach for dynamic scene understanding. The input surveillance video is represented by a topic model, where clips without overlapping are treated as the document while low-level visual features are quantized into discrete words. Then motion pattern mining is converted to a problem of learning latent topics. In the BiSTM model, both the topic level sparsity and the document level sparsity guarantee the semantics of motion pattern. In addition, considering the characteristic of extreme imbalance between numerous typical normal activities and few rare abnormalities in surveillance video data, the one-class learning problem is introduced and a discriminative BiSTM is proposed for abnormality detection. Experimental results and comparisons demonstrate the promising performance of the proposed approach. 3. We propose a novel presentation approach to vividly depict the moving process of a specific object in a surveillance video, which aims at effectively summarizing...
关键词	视频监控行为理解结构支持向量机群体行为分类视频摘要稀疏主题模型 Video Surveillance Activity Understanding Structure Svm Collective Activity Classification Video Summarization Sparse Topical Model
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6611
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	付伟. 面向智能监控的视觉场景行为理解与摘要技术研究[D]. 中国科学院自动化研究所. 中国科学院大学,2014.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20111801462803（6203KB）			暂不开放	CC BY-NC-SA