基于扩展随机文法的视频语义事件识别

CASIA OpenIR > 毕业生 > 博士学位论文

	基于扩展随机文法的视频语义事件识别
其他题名	Extended Stochastic Grammar based Visual Event Recognition
	张彰
	2008-12-28
学位类型	工学博士
中文摘要	从视频中自动识别语义事件是计算机视觉研究的一个重要目标。并且在现实生活中，它还具有极其重要的应用价值。本文围绕着这个问题，提出了一种基于扩展随机文法的视频语义事件识别方法。具体而言，本文主要开展了以下几方面的工作： 1.通过对运动轨迹进行分析，提出了一种基于场景语义点的原子事件自动学习方法。首先，通过语义场景建模或者人为指定，得到若干场景中的语义点；之后，通过轨迹聚类得到若干场景中基本的运动模式作为文法系统中的原子事件。为了更有效的进行轨迹聚类，我们还对各种轨迹相似性测度进行了比较，并通过对各个相似性测度的应用背景进行考察，分析解释了实验结果。 2.提出了一种基于最小描述长度（Minimum Description Length）准则的事件规则归纳方法。首先，我们将Allen的时间关系逻辑加入到原有的随机上下文无关文法（Stochastic Context Free Grammar）规则之中，用以表达子事件之间的并行时间关系；其次，基于“时空相似的事件优先组合在一起”的想法，我们利用一种多层规则归纳策略获取事件规则；同时，我们还提出了一种对事件规则和训练数据进行编码的具体编码方式，用以最小描述长度准则的具体执行。在广播体操场景和交通路口场景中的实验表明，所提出的规则归纳算法可有效的学习得到场景中复杂事件的内在结构组成关系，并将其表达为一系列的事件规则。 3.基于这种扩展的随机文法规则表示，本文采用一种多线程文法分析算法（Multi-Thread Parsing）。通过放宽组成分析状态的原子集约束，使得文法分析算法可以对子事件间的并行时间关系进行处理；通过一种类似Viterbi算法的噪声恢复机制，使得文法分析算法可以处理大尺度噪声，如删除噪声（deletion error）和插入噪声（insertion error）。在广播体操场景和交通路口场景中的实验表明，所提出的多线程文法分析算法可以有效的对视频中的复杂事件进行识别。 4.我们还将所提出的基于扩展随机文法的事件识别方法应用于多人交互行为的识别。在此，与静止场景中原子事件获取方法不同，我们首先将每一个运动物体看作为一个局部相对坐标系；之后我们将其他运动物体的位置投影到这个局部坐标系中，通过对局部坐标系中相对位置点进行聚类，我们得到若干交互原子（Interaction Primitive）；最后，运动物体的交互行为可表达为不同局部坐标系中的多个原子事件串的联合（原子事件流），进而我们可以利用所提出的扩展的随机文法对原子事件流进行建模和识别。对五种交互行为的识别实验表明我们的方法取得了令人鼓舞的识别结果。
英文摘要	Automatic recognition of semantic event from videos is a very important issue in computer vision, which is also key issue in many practical applications. To tackle the problem, we propose an extended stochastic grammar based approach to the recognition of visual events. Major contributions of this thesis include the following: 1.By analyzing motion trajectory, we propose a semantic point guided approach to learning the primitives in the grammar system. First, a number of semantic points in the video scene are acquired by semantic scene modeling or manually. Then the trajectory clustering is performed to obtain some basic motion patterns that serve as the primitives. 2.We propose a Minimum Description Length (MDL) principle based rule induction algorithm to obtain a set of event rules. First, we extend the original grammar with Allen's temporal logic to represent the parallel relations between sub events. Then based on the idea “it prefers to combine the events which are similar with each other to form a new concept”, a multi-level induction strategy is proposed to guide the rule induction process. We also suggest a kind of coding scheme to encode the rules and input event stream in the using of the MDL principle. The experimental results in gymnastic exercises and traffic events demonstrate our method can learn the event structures effectively. 3.Based on the extended grammar representation, we propose a multi-thread parsing algorithm to recognize the complex events in the given primitive stream. By relax the ID set constraint, the parsing algorithm can handle the parallel temporal relations between sub events; Additionally, a Viterbi-like error recovery strategy is embedded in the parsing process to correct the large scale errors, such as insertion and deletion errors. Extensive experiments including gymnastic exercises and traffic events are performed to demonstrate the effectiveness of our method. 4.To further validate the capability of the proposed method, we also adopt the extended stochastic grammar system to recognize multi-agent interactions. Different from the primitive learning in static scene, we firstly set a local relative coordinate for each motion object. Then, the motion trajectories of other moving objects are projected into the local coordinate. Some of interaction primitives (IPs) are acquired by motion points clustering in the local coordinate. Finally, the multi-agent interactions is represented as a primitive stream that comprised of several IP string in different local coordinates, then we can use the extend grammar system to model and recognize the interaction. The satisfying experimental results in recognizing five interactions validate the proposed method
关键词	视频语义事件识别轨迹分析规则归纳文法分析 Semantic Event Recognition In Video Trajectory Analysis Grammar Rule Induction Grammar Parsing
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6132
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	张彰. 基于扩展随机文法的视频语义事件识别[D]. 中国科学院自动化研究所. 中国科学院研究生院,2008.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20041801462804（5281KB）			暂不开放	CC BY-NC-SA