广播视频的结构分析和语义检索

CASIA OpenIR > 毕业生 > 博士学位论文

	广播视频的结构分析和语义检索
其他题名	Structure Analysis and Semantic Retrieval for Broadcast Videos
	王金桥
	2008-06-04
学位类型	工学博士
中文摘要	本文针对广播视频的结构和语义理解进行了深入的研究，涉及到了许多视频处理和内容检索的基本问题，其中包括广播视频的镜头边界检测、节目分割、节目摘要、节目分类和节目检索等。主要的工作和贡献有：（1）从广播视频本身的结构和制作特点出发，分析了当前视频分割、分类和检索中存在的问题，提出了一种基于多模态融合的节目分割和表达框架，并提出了三种中层的特征来连接低层特征和高层的语义之间的“鸿沟”。并用视觉和文本特征对节目进行多模态的表达，从而使用户更方便的对视频节目进行浏览和搜索。（2）深入研究了广播视频中logo分布与节目边界之间的关系，将多值图像的梯度运算扩展到视频处理中，提出了一种基于广义梯度的视频中的logo处理算法框架，能够对静态、动态、和半透明的logo进行检测、跟踪和去除。（3）利用POIM图像检索和视频关键帧序列匹配相互结合的方式，提出了一种由粗到精的快速视频节目检索算法。与传统的视频检索算法相比，能够克服颜色扭曲、码流变化和分辨率变化等造成的影响，从而增加了视频节目检索的鲁棒性。（4）针对广播视频中的广告视频进行了分析，实现了包括广告的分割、分类和检索的广告视频摘要系统。提出了一种基于FMPI图像同时结合视觉场景的变化和音频场景的变化，以及一些广告领域的黑帧、静音等特征来检测广告的边界。潜在语义分析用来自动挖掘与产品和服务有关的视觉和文本概念，对视频广告按产品和服务进行分类。基于FMPI图像和关键帧序列匹配的广告检索方式，满足广告的监控以及搜索的需要。
英文摘要	This dissertation study the structure and semantic understanding of broadcast video, which involves a lot of basic issues in video processing and content retrieval, including shot boundary detection, program segmentation, program summary, program classification and programs indexing. The main work and contributions of this thesis include following issues: (1) From the structure and production characteristics of broadcast video, we analyze the existing problems of video segmentation, classification and retrieval, and proposed a multimodal fusion framework of broadcast video analysis. Three middle-level features are proposed to bridge the semantic gap between the low-level features and high-level semantics. We further propose a program segmentation and expression framework based on visual and textual features, which makes it easy for users to easily browsing and indexing of video program. (2) Through the in-depth study of relationship between program boundary and logo existence of video programs, we extend the gradient of multi-value image to video processing, and propose a logo processing algorithm framework based on generalized gradient, which can deal with static, animated and semi-transparent logos including detection, tracking and removal. (3) We propose a coarse to fine rapid video program retrieval algorithm, based on POIM image and key frame sequence matching. Compared with key frame sequence matching and clip based retrieval approaches, our approach overcomes the influence by the distortion of color, Encoding change and resolution change, and increases the robustness of the program retrieval. (4) We develop a video ads digesting system including ads segmentation, categorization and recognition. We study the characteristic of the boundaries between individual ads, and propose to use FMPI image, video scene change, audio scene change, as well as black frame and silence features in ads domain to detect the boundaries of ads. To classifying video ads by product and service, we employ latent semantic analysis to mine the latent visual and textual concept related with product and service. Video ads retrieval with FMPI image and key frame sequence matching can meet the requirement of ads monitor and recognition.
关键词	视频检索结构分析视频分割多模态分析场景检测广告分类 Video Retrieval Structure Analysis Video Segmentation Multimodal Fusion Scene Detection Ads Classification
语种	中文
文献类型	学位论文
条目标识符	http://ir.ia.ac.cn/handle/173211/6119
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王金桥. 广播视频的结构分析和语义检索[D]. 中国科学院自动化研究所. 中国科学院研究生院,2008.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
CASIA_20041801462803（3635KB）			暂不开放	CC BY-NC-SA