With the rapid growth of the technology of multimedia and network, especially, the widespread of the International Mobile Telecommunications-2000 (IMT-2000), better known as 3G or 3rd Generation, the amounts of multimedia data are increasing greatly. How to efficiently locate, exploit and manage the useful information from video is in urgent demand. The essence of this problem is how to efficiently analysis and represent the video event, to construct context and related domain knowledge such that inference various cues and make the relation of features and semantics. Static and dynamic features are the two main attributes of video. The latter can basically be obtained from static images, e.g. person, objects, buildings, etc; while motion features are the important attribute distinguished from static images, e.g. the motion of objects and the interaction among different people. How to efficiently describe the two attributes of video and fusion the two attributes are the study content of this thesis. Based on this direction, the main work including: (1) Content based video structure analysis; (2) Sports video content analysis; (3) Event recognition. The main contributions of this thesis include following issues: 1. We proposed a optical flow based shot detection algorithm. Since the calculation of optical flow field depends on the assumption of brightness constancy, the violation of brightness constraint across a shot change provides a motivation for our method. The motion discontinuities are regarded as the candidate boundaries and the color features are combined to remove false alarms. Experimental results demonstrate that this method is not only robust to camera and object motion, but also can handle complicated situations. 2. Nonparametric motion features and information entropy based key frames extraction method is proposed. We propose a compact representation of the dominant motion information for each frame, based on a mean shift analysis procedure. The criteria of key frames is the maximum of the entropy, and mutual information is used to measure the similarity between frames. Experimental results demonstrate that the key frames we extracted are more concise and informative. 3. We presents a set of novel features for classifying basketball video clips into semantic events and a simple way to use prior temporal context information to improve the accuracy of classification. Specifically, the feature set consists of a motion descriptor, motion hi...
修改评论