Semantic video annotation is an important research direction in the field of information retrieval. It is a technique which attempts to detect semantic concepts in the videos according to their content. This is a preliminary step for video indexing and retrieval. The traditional video annotation methods mainly rely on the video itself, extract low-level features in the video to describe the video content, and build relationship between the features and the semantics. Due to the semantic gap, they face the difficulty for high-level semantic analysis and understanding. Especially for the domain specific videos such as the sports video and the movie, the focus of the audience is mainly on the high-level semantics such as who, what, when and how. To bridge the gap between the low features and the high-level semantics, we propose to employ external knowledge for help. In this dissertation, we study on the multimodality based semantic annotation methods in the sports video and the movie. In our method, we incorporate external knowledge which can be extracted from the counterpart texts of the video such as the web-cast text for a sport game and the film script for a movie. It is shown that with the external knowledge we can automatically generate domain-specific semantic concepts and thus get annotation results which is comparable to the manually labeled ground truth. Based on the video annotation, we also present novel methods for video retrieval and browsing. The main contributions of the dissertation are as follows: 1.We present an approach for sports video event detection and semantic annotation based on analysis and alignment of web-cast text and broadcast video.We first analyze the web-cast text to cluster and detect text events in an unsupervised way.Based on the detected text event and video structure analysis,we employ a conditional random field model to align text event and video event by detecting event moment and event boundary in the video. Incorporation of web-cast text into sports video analysis significantly facilitates sports video event detection and semantic annotation. 2.Based on the sports video annotation, a personalized sports video retrieval method is presented. The video clips can be initially retrieved based on different semantic attributes by text query. For user preference acquisition, we utilize clickthrough data as a feedback from the user. Relevance feedback is applied on both the text annotation and the visual features to ...
修改评论