英文摘要 | With the development of the Internet and the popularity of video equipment, video data in movies, sports, news, video surveillance and other areas is becoming explosive growth. Therefore, it is urgently required to make the unstructured video data accessible and searchable with great ease and flexibility. As we know, activity analysis can help recognize and localize events in video data, and it is currently one of the most active research topics in computer vision. This strong interest is driven by a wide spectrum of promising applications in many areas such as smart surveillance, perceptual interface, virtual reality, content-based video retrieval, and etc. Although researchers have done a lot of work in recent years, there are still several key technical issues: Feature extraction and fusion. There are various features derived from significantly different modalities, such as static visual cues, e.g. shape and appearance, as well as dynamic cues, e.g. spatial-temporal trajectories and motion field. Such diversity also to what degree they compensate for each other. Because these different kinds of features are heterogeneous, it is difficult to mine their effectiveness just with simple fusion. Therefore, how to fuse different features is an important and basic problem for activity analysis. How to model different activities and measure their similarities for recognition. As we know, different actions have different time resolutions, and may have some similar elements. In addition, though they are the same actions, they may contain some different elements. Therefore, it is difficult to model and measure activity for recognition. How to train activity model just based on a few labeled samples. Human action recognition is a challenging task due to significant intra-class variations, occlusion, and background clutter. Most of the existing work uses the action models based on statistic learning algorithms for classification. To achieve good performance on recognition, a large amount of the labeled samples are therefore required to train the sophisticated action models. However, collecting labeled samples is labor-intensive and time-consuming. On the contrary, unlabeled examples are much less expensive and easier to obtain than labeled examples. For example, we can obtain large-scale unlabeled samples from the Internet. Therefore, it is critical to develop algorithms that are able to learn from a small number of labeled examples a... |
修改评论