☆ 4.7 Article

Recognizing Actions Through Action-Specific Person Detection

IEEE TRANSACTIONS ON IMAGE PROCESSING (2015)

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING

Volume 24, Issue 11, Pages 4422-4432

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TIP.2015.2465147

Keywords

Action recognition; transfer learning; deep features

Funding

Svalbard Science Forum through the Collaborative Unmanned Aircraft Systems Project
VR through the ETT Project
Strategic Area for ICT Research ELLIIT
CADICS
Academy of Finland [255745, 251170]
Data to Intelligence DIGILE SHOK Project [TIN2013-41751, TIN2014-52072-P]
Spanish Morocco Economic Competitiveness Project [TRA2014-57088-C2-1-R]
Spanish Ministry of Science through the Spanish DGT Project [SPIP2014-01352]
Generalitat de Catalunya Project [2014-SGR-1506, 2014-SGR-221]
MICINN through Ramon y Cajal Fellowship
Chinese Scholarship Council [2011611023]
Academy of Finland (AKA) [255745, 255745] Funding Source: Academy of Finland (AKA)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test time, outperforms on both data sets state-of-the-art methods, which do use person locations.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Knowledge memorization and generation for action recognition in still images *

Jian Dong, Wankou Yang, Yazhou Yao, Fatih Porikli

Summary: Human action recognition in visual data is a fundamental challenge in computer vision, with existing approaches mainly based on video data. This paper introduces a novel method that transfers knowledge from action videos to images for recognizing actions in still images. Results show that transferred knowledge from color and motion flow sequences can significantly improve the performance of still image based human action recognition.

PATTERN RECOGNITION (2021)