基于多模态融合的视频内容分析及其个性化定制

CASIA OpenIR > 毕业生 > 博士学位论文

	基于多模态融合的视频内容分析及其个性化定制
其他题名	Multimodality-based Video Content Analysis and Its Personalized Customization
	梁超
	2012-05-31
学位类型	工学博士
中文摘要	视频内容分析是当前多媒体领域的热点研究问题之一，是视频数据检索、浏览、和管理的关键技术。它能够对视频内容按照不同的语义概念标注关键字，从而为后续的数据管理、检索和浏览提供一种高效快捷的途径。传统的视频分析方法主要从视频单一模态出发，依靠从视频数据中提取的底层特征去描述视频的内容。但是由于``语义鸿沟''的存在，使得基于底层特征的视频内容分析和理解存在着很大的困难。尤其是对于体育、电影等内容丰富的特定领域视频，人们关注的重点往往不是一些简单而泛化的语义概念（例如进球，争吵等），而是一些具体的人物和事件描述（例如梅西的中场头球，Ross和Rach在客厅争论房租问题等）。在本文中，我们采用了跨模态分析的方法来解决上述问题。具体地，我们通过挖掘文本和视频的时序对应关系，将语义的文本描述准确地关联到相应的视频片段之上，从而实现关于视频内容的详细语义标注。在此基础之上，我们针对用户的个性化需求提出了两个新的应用：体育视频的个性化定制和电视场景的个性化合成。前者能够让观众根据他们感兴趣的球员和事件来检索和摘要视频内容，后者能够让普通人通过创作故事剧本来自动产生影视剧视频。本文的主要工作和贡献如下： 1. 针对体育视频的分析，我们提出了一个种不依赖时间戳信息的视频文本匹配方法来标注视频内容。我们首先利用贝叶斯网络和关键词匹配的方法，将视频内容和文本描述转化成语义标签序列，其中的每个标签对应着比赛中的一次进攻（粗匹配）或者独立事件（细匹配），而标签的编码则反映了对应片段中所包含的语义事件组合。接下来，我们利用序列匹配的方法将视频和文本标签序列进行匹配，从而将相应的文本描述关联到对应的视频片段之上，最终得到语义的视频标注； 2. 利用体育视频分析的结果，我们设计并实现了一个基于移动设备的体育视频个性化定制系统。考虑到用户针对不同球员和事件的个性化偏好，比赛事件对与整场比赛的影响以及用户观看环境的各种限制，我们提出了一个约束优化的模型来建模环境受限条件下个性化的视频定制问题。同时，我们提出了基于社会网络的用户偏好学习方法，它能够在不增加用户额外交互负担的前提下尽可能全面准确地掌握用户观看喜好； 3. 针对电视剧视频的分析，我们提出了一个产生式的图模型来建模电视剧的拍摄过称。通过模型参数的学习，我们可以无监督地计算出人名-人脸的对应关系；同时，通过隐状态序列的推理，我们确定全局最优的视频场景结构。另外，我们还找到了快速解法来加快模型的参数学习和隐状态推理过称； 4. 利用电视剧视频分析的结果，我们提出了一个电视场景个性化合成的应用。整套方案从功能上可以分为离线标注与在线合成两个部分。前者使用之前的影视剧拍摄模型来自动关联视频内容与剧本描述，从而得到大量丰富的有语义标注的视频素材；后者根据用户提交的剧本故事来选择和组织合适的视频片段以构成最终的影视作品。我们的方法综合地考虑了语义内容和视觉效果两方面的因素，能够准确生动地对用户剧本故事给予艺术化地视觉呈现。
英文摘要	As a hot research topic in the multimedia community, video analysis facilitates efficient video retrieval, browsing and management. It gives semantic tags to represent the relevant video content, which facilitate the following data management, retrieval and browsing. Previous methods usually analyze video content from a single modality, from which various low-level features are extracted to infer high-level semantics. Due to the existence of the semantic gap, such content-based methods can hardly extract detailed semantic descriptions from video content. Especially for the domain specific videos such as the sports video and the movie, the focus of the audience is not simple concepts (e.g. goal, quarrel), but instead, those detailed descriptions (such as Messi’s head goal, Ross is quarreling with Rach about the rent the living room). We propose cross-modality analysis to overcome the above difficulty. Specifically, we inspect the temporal correspondence between textual descriptions and video content, from which detailed semantics can be attached to the relevant video segment, and hence generating semantic video annotations. Based on the video analysis result, we propose two novel applications, personalized sports video customization and personalized movie scene synthesis,to meet audiences' personalized appetites. The former enables people to retrieve and summarize their interested video segments about specific player or event and the latter facilitate film producers to make their expected story movies through writting story scripts. Generally speaking, the main contributions of our work are as follows: 1. We propose a timestamp-independent method to annotate sports video content with external web text description. With the help of Bayesian network and keyword matching, video content and textual descriptions are first converted into semantic tag sequences. Each tag corresponds to a complete attack or an individual event. Then, sequence matching algorithm is used to align the above two semantic tag sequences and generate the final video annotations. 2. We realize a personalized sports video customization for mobile users. Considering the subjective content preferences and objective environment constraints, we raise a constraint optimization model to formulate the condition-limited video customization problem. Moreover, we design a concept social network to learn hidden user preference without adding additional user interaction. 3. We propose a novel ...
关键词	体育视频影视剧视频分析个性化定制 Sports Video Movie Video Analysis Personalized Customization
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6466
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	梁超. 基于多模态融合的视频内容分析及其个性化定制[D]. 中国科学院自动化研究所. 中国科学院研究生院,2012.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20081801462804（5514KB）			暂不开放	CC BY-NC-SA