英文摘要 | This dissertation study the structure and semantic understanding of broadcast video, which involves a lot of basic issues in video processing and content retrieval, including shot boundary detection, program segmentation, program summary, program classification and programs indexing. The main work and contributions of this thesis include following issues: (1) From the structure and production characteristics of broadcast video, we analyze the existing problems of video segmentation, classification and retrieval, and proposed a multimodal fusion framework of broadcast video analysis. Three middle-level features are proposed to bridge the semantic gap between the low-level features and high-level semantics. We further propose a program segmentation and expression framework based on visual and textual features, which makes it easy for users to easily browsing and indexing of video program. (2) Through the in-depth study of relationship between program boundary and logo existence of video programs, we extend the gradient of multi-value image to video processing, and propose a logo processing algorithm framework based on generalized gradient, which can deal with static, animated and semi-transparent logos including detection, tracking and removal. (3) We propose a coarse to fine rapid video program retrieval algorithm, based on POIM image and key frame sequence matching. Compared with key frame sequence matching and clip based retrieval approaches, our approach overcomes the influence by the distortion of color, Encoding change and resolution change, and increases the robustness of the program retrieval. (4) We develop a video ads digesting system including ads segmentation, categorization and recognition. We study the characteristic of the boundaries between individual ads, and propose to use FMPI image, video scene change, audio scene change, as well as black frame and silence features in ads domain to detect the boundaries of ads. To classifying video ads by product and service, we employ latent semantic analysis to mine the latent visual and textual concept related with product and service. Video ads retrieval with FMPI image and key frame sequence matching can meet the requirement of ads monitor and recognition. |
修改评论