Journal
MULTIMEDIA TOOLS AND APPLICATIONS
Volume 76, Issue 11, Pages 13367-13382Publisher
SPRINGER
DOI: 10.1007/s11042-016-3768-5
Keywords
Human action recognition; Convolutional neural networks (CNN); Stratified pooling (SP); Support vector machines (SVM)
Categories
Funding
- Nature Science Foundation of China [61202143, 61572409, 61571188]
- Natural Science Foundation of Fujian Province [2013J05100]
- Research Foundation of Education Bureau of Hunan Province [15C0726]
Ask authors/readers for more resources
Video based human action recognition is an active and challenging topic in computer vision. Over the last few years, deep convolutional neural networks (CNN) has become the most popular method and achieved the state-of-the-art performance on several datasets, such as HMDB-51 and UCF-101. Since each video has a various number of frame-level features, how to combine these features to acquire good video-level feature becomes a challenging task. Therefore, this paper proposed a novel action recognition method named stratified pooling, which is based on deep convolutional neural networks (SP-CNN). The process is mainly composed of five parts: (i) fine-tuning a pre-trained CNN on the target dataset, (ii) frame-level features extraction; (iii) the principal component analysis (PCA) method for feature dimensionality reduction; (iv) stratified pooling frame-level features to get video-level feature; and (v) SVM for multiclass classification. Finally, the experimental results conducted on HMDB-51 and UCF-101 datasets show that the proposed method outperforms the state-of-the-art.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available