英文摘要 | Human action recognition(HAR) is an active topic in computer vision. HAR is to recognize and analyze human action, interactive action, and group action from the image or video by computer vision technology. HAR plays an important role in visual analysis and understanding of human motion, and it is the high-level vision part and the ¯nal Objective. Besides, It has many potential pplications, such as smart surveillance, automatic analysis of sports events, human-computer interface, and virtual reality. The research in HAR will bring new interactive methods for people's lives. Recently, the development in HAR is fast. Furthermore, a trend in action recognition is the bag of visual words (BOVW) appearance-based approach, which exploits local spatio-temporal features. The BOVW based approaches avoid several difficult preprocessing such as foreground segmentation, object detection, object tracking. Moreover, the BOVW based approaches are more robust to noise, occlusion and action variation including geometric variations than the large-scale features based ones. In this thesis, we study the human action recognition which involves a lot of important and difficult problems e.g. feature extraction, action representation, action recognition, etc.. The main contributions of our work are summarized as follows: 1 We propose a new local spatio-temporal feature to describe the cuboids detected in video sequences. Specifically, the descriptor utilizes the covariance matrix to characterize the low-level features within each cuboid. Covariance matrices do not lie on Euclidean space. Therefore, the Log-Euclidean Riemannian metric is employed to measure the distances between covariance matrices. Moreover, the Earth Mover's Distance (EMD) is employed to match pairs of video sequences for the first time. Compared with the widely used Euclidean distance, EMD is more robust in matching histograms/distributions with different sizes. Experimental results on three datasets demonstrate the effectiveness of the proposed framework. 2 We propose a pyramid vocabulary tree to model local spatio-temporal features. For the BOVW based methods, it is crucial to determine the size of vocabulary. Usually, large vocabulary size of the BOVW is more discriminative for inter-class action classification while small one is more robust to noise and thus tolerant to the intra-class variance. The proposed pyramid vocabulary tree can both characterize the inter-class difference and allow intra-c... |
修改评论