Journal
MULTIMEDIA TOOLS AND APPLICATIONS
Volume 74, Issue 6, Pages 2127-2142Publisher
SPRINGER
DOI: 10.1007/s11042-013-1746-8
Keywords
Action recognition; Dense sampling; Local spatio-temporal feature; Gaussian mixture model; Lie algebrized gaussians
Categories
Funding
- National Natural Science Foundation of China [61073094, U1233119]
Ask authors/readers for more resources
This paper presents a novel framework for human action recognition based on a newly proposed mid-level feature representation method named Lie Algebrized Guassians (LAG). As an action sequence can be treated as a 3D object in space-time space, we address the action recognition problem by recognizing 3D objects and characterize 3D objects by the probability distributions of local spatio-temporal features. First, for each video, we densely sample local spatio-temporal features (e.g. HOG3D) at multiple scales confined in bounding boxes of human body. Moreover, normalized spatial coordinates are appended to local descriptor in order to capture spatial position information. Then the distribution of local features in each video is modeled by a Gaussian Mixture Model (GMM). To estimate the parameters of video-specific GMMs, a global GMM is trained using all training data and video-specific GMMs are adapted from the global GMM. Then the LAG is adopted to vectorize those video-specific GMMs. Finally, linear SVM is employed for classification. Experimental results on the KTH and UCF Sports dataset show that our method achieves state-of-the-art performance.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available