4.7 Article

Recognizing Actions Through Action-Specific Person Detection

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING
Volume 24, Issue 11, Pages 4422-4432

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2015.2465147

Keywords

Action recognition; transfer learning; deep features

Funding

  1. Svalbard Science Forum through the Collaborative Unmanned Aircraft Systems Project
  2. VR through the ETT Project
  3. Strategic Area for ICT Research ELLIIT
  4. CADICS
  5. Academy of Finland [255745, 251170]
  6. Data to Intelligence DIGILE SHOK Project [TIN2013-41751, TIN2014-52072-P]
  7. Spanish Morocco Economic Competitiveness Project [TRA2014-57088-C2-1-R]
  8. Spanish Ministry of Science through the Spanish DGT Project [SPIP2014-01352]
  9. Generalitat de Catalunya Project [2014-SGR-1506, 2014-SGR-221]
  10. MICINN through Ramon y Cajal Fellowship
  11. Chinese Scholarship Council [2011611023]
  12. Academy of Finland (AKA) [255745, 255745] Funding Source: Academy of Finland (AKA)

Ask authors/readers for more resources

Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test time, outperforms on both data sets state-of-the-art methods, which do use person locations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Chemistry, Analytical

Co-Training for Unsupervised Domain Adaptation of Semantic Segmentation Models

Jose L. Gomez, Gabriel Villalonga, Antonio M. Lopez

Summary: This paper proposes a new co-training procedure for the unsupervised domain adaptation of semantic segmentation models from synthetic to real images. The procedure involves training intermediate deep models with both synthetic and real images and iteratively labeling real-world training images. The collaboration between the models is achieved through a self-training stage and a model collaboration loop. Experimental results demonstrate significant improvements over baselines on standard synthetic and real-world datasets.

SENSORS (2023)

Article Computer Science, Artificial Intelligence

CyTran: A cycle-consistent transformer with multi-level consistency for non-contrast to contrast CT translation

Nicolae-Catalin Ristea, Andreea-Iuliana Miron, Olivian Savencu, Mariana-Iuliana Georgescu, Nicolae Verga, Fahad Shahbaz Khan, Radu Tudor Ionescu

Summary: We propose a novel approach to translate unpaired contrast CT scans to non-contrast CT scans and vice versa. Our method is based on cycle-consistent generative adversarial convolutional transformers, which can be trained on unpaired images and achieve superior results through multi-level cycle-consistency loss. We also introduce a novel dataset and show that our approach outperforms state-of-the-art methods for image style transfer in medical domain.

NEUROCOMPUTING (2023)

Review Environmental Sciences

Transformers in Remote Sensing: A Survey

Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal, Gui-Song Xia, Fahad Shahbaz Khan

Summary: Deep learning algorithms have gained popularity in remote sensing image analysis, and transformer-based architectures have been widely used in computer vision with self-attention mechanism replacing convolution operator. Inspired by this, the remote sensing community has explored vision transformers for various tasks. This survey presents a systematic review of recent transformer-based methods in remote sensing, covering different sub-areas like very high-resolution (VHR), hyperspectral (HSI), and synthetic aperture radar (SAR) imagery. The survey concludes by discussing challenges and open issues of transformers in remote sensing.

REMOTE SENSING (2023)

Article Computer Science, Artificial Intelligence

Transformers in medical imaging: A survey

Fahad Shamshad, Salman Khan, Syed Waqas Zamir, Muhammad Haris Khan, Munawar Hayat, Fahad Shahbaz Khan, Huazhu Fu

Summary: This survey reviews the applications of Transformers in medical imaging, covering tasks such as medical image segmentation, detection, classification, restoration, synthesis, registration, and clinical report generation. The challenges and solutions for each application are discussed, and future research directions are highlighted. The survey aims to spark interest in the academic community and provide researchers with an up-to-date reference regarding the applications of Transformer models in medical imaging.

MEDICAL IMAGE ANALYSIS (2023)

Article Computer Science, Artificial Intelligence

Self-Training for Class-Incremental Semantic Segmentation

Lu Yu, Xialei Liu, Joost van de Weijer

Summary: This paper addresses the problem of catastrophic forgetting in deep neural networks during incremental learning in class-incremental semantic segmentation. A self-training approach is proposed, leveraging unlabeled data for rehearsal of previous knowledge. Experimental results show that maximizing self-entropy and using diverse auxiliary data can significantly improve performance. State-of-the-art results are achieved on Pascal-VOC 2012 and ADE20K datasets.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Proceedings Paper Computer Science, Artificial Intelligence

SAT: Scale-Augmented Transformer for Person Search

Mustansar Fiaz, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan

Summary: This paper proposes a three-stage cascaded Scale-Augmented Transformer (SAT) framework for person search, which combines the benefits of convolutional neural networks and transformers. Experimental results demonstrate the favorable performance of our method compared to state-of-the-art methods on challenging datasets.

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes for Medical Image Super-Resolution

Mariana-Iuliana Georgescu, Radu Tudor Ionescu, Andreea-Iuliana Miron, Olivian Savencu, Nicolae-Catalin Ristea, Nicolae Verga, Fahad Shahbaz Khan

Summary: We propose a novel multimodal multi-head convolutional attention module for super-resolving CT and MRI scans, which outperforms state-of-the-art attention mechanisms in super-resolution. By jointly processing the CT and MRI scans in a multimodal fashion, our attention module improves the quality of super-resolution results.

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) (2023)

Proceedings Paper Computer Science, Artificial Intelligence

PS-ARM: An End-to-End Attention-Aware Relation Mixer Network for Person Search

Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer, Fahad Shahbaz Khan

Summary: This paper proposes a novel attention-aware relation mixer (ARM) module for person search, which exploits the global relation between different local regions within the region of interest (RoI) of a person, making it robust against appearance deformations and background distractors.

COMPUTER VISION - ACCV 2022, PT V (2023)

Article Computer Science, Artificial Intelligence

Class-Incremental Learning: Survey and Performance Evaluation on Image Classification

Marc Masana, Xialei Liu, Bartlomiej Twardowski, Mikel Menta, Andrew D. Bagdanov, Joost van de Weijer

Summary: For future learning systems, incremental learning is desirable due to its efficient resource usage, reduced memory usage, and resemblance to human learning. The main challenge for incremental learning is catastrophic forgetting. This paper provides a comprehensive survey of existing class-incremental learning methods for image classification and performs extensive experimental evaluations on thirteen methods.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Visual Object Tracking With Discriminative Filters and Siamese Networks: A Survey and Outlook

Sajid Javed, Martin Danelljan, Fahad Shahbaz Khan, Muhammad Haris Khan, Michael Felsberg, Jiri Matas

Summary: Accurate and robust visual object tracking is a challenging problem in computer vision. This survey reviews more than 90 Discriminative Correlation Filters (DCFs) and Siamese trackers, based on results in nine tracking benchmarks. It presents the background theory, research challenges, and performance analysis of both DCFs and Siamese trackers, and provides recommendations for future research.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Stylized Adversarial Defense

Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

Summary: To address the vulnerability of CNNs to imperceptible changes in input images, an adversarial training approach is proposed. This approach creates adversarial perturbations by utilizing the style, content, and class-boundary information of target samples. A deeply supervised multi-task objective is used to extract multi-scale feature knowledge, and a max-margin adversarial training approach is applied to minimize the distance between the source image and its adversary while maximizing the distance between the adversary and the target image. This adversarial training approach demonstrates strong robustness, generalization to corruptions and data distribution shifts, and accuracy on clean examples.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Information Systems

Construction and Optimization of Dynamic S-Boxes Based on Gaussian Distribution

Adel R. Alharbi, Sajjad Shaukat Jamal, Muhammad Fahad Khan, Mohammad Asif Gondal, Aaqif Afzaal Abbasi

Summary: This paper proposes an innovative approach for constructing dynamic S-boxes using Gaussian distribution-based pseudo-random sequences. The proposed technique overcomes the weaknesses of existing chaos-based S-box techniques by leveraging the strength of pseudo-randomness sequences. The technique achieves a maximum nonlinearity of 112, which is comparable to the ASE algorithm.

IEEE ACCESS (2023)

Proceedings Paper Computer Science, Artificial Intelligence

3D-Aware Multi-Class Image-to-Image Translation with NeRFs

Senmao Li, Joost van de Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang

Summary: Recent advances in 3D-aware generative models combined with Neural Radiance Fields have achieved impressive results in 3D consistent multi-class image-to-image translation. To address the unrealistic shape/identity change in 2D-I2I translation, the learning process is divided into a multi-class 3D-aware GAN step and a 3D-aware I2I translation step, with novel techniques proposed to reduce view-consistency problems.

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2023)

Proceedings Paper Computer Science, Artificial Intelligence

MVMO: A MULTI-OBJECT DATASET FOR WIDE BASELINE MULTI-VIEW SEMANTIC SEGMENTATION

Aitor Alvarez-Gila, Joost van de Weijer, Yaxing Wang, Estibaliz Garrote

Summary: MVMO is a synthetic dataset with high object density and wide camera baselines, enabling research in multi-view semantic segmentation and cross-view semantic transfer. New research is needed to utilize the information from multi-view setups effectively.

2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Visual Transformers with Primal Object Queries for Multi-Label Image Classification

Vacit Oguz Yazici, Joost Van De Weijer, Longlong Yu

Summary: This paper investigates the problem of multi-label image classification and proposes an enhanced transformer model that utilizes primal object queries to improve model performance and convergence speed.

2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) (2022)

No Data Available