4.8 Article

Rank Pooling for Action Recognition

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2016.2558148

Keywords

Action recognition; temporal encoding; temporal pooling; rank pooling; video dynamics

Funding

  1. FP7 ERC Starting Grant [240530 COGNIMUND]
  2. KU Leuven DBOF PhD fellowship
  3. FWO project Monitoring of abnormal activity with camera systems
  4. iMinds High-Tech Visualization project
  5. Australian Research Council Centre of Excellence for Robotic Vision [CE140100016]

Ask authors/readers for more resources

We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e. g., how frame-level features evolve over time in a video. We show how the parameters of a function that has been fit to the video data can serve as a robust new video representation. As a specific example, we learn a pooling function via ranking machines. By learning to rank the frame-level features of a video in chronological order, we obtain a new representation that captures the video-wide temporal dynamics of a video, suitable for action recognition. Other than ranking functions, we explore different parametric models that could also explain the temporal changes in videos. The proposed functional pooling methods, and rank pooling in particular, is easy to interpret and implement, fast to compute and effective in recognizing a wide variety of actions. We evaluate our method on various benchmarks for generic action, fine-grained action and gesture recognition. Results show that rank pooling brings an absolute improvement of 7-10 average pooling baseline. At the same time, rank pooling is compatible with and complementary to several appearance and local motion based methods and features, such as improved trajectories and deep learning features.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

Residual Tuning: Toward Novel Category Discovery Without Labels

Yu Liu, Tinne Tuytelaars

Summary: Discovering novel visual categories from unlabeled images is crucial for intelligent vision systems, and we propose a residual-tuning approach to overcome the tradeoff between preserving features on labeled data and adapting features on unlabeled data. Our method achieves consistent and considerable gains on benchmark tests, reducing the performance gap to fully supervised learning setup.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Agriculture, Multidisciplinary

Inline nondestructive internal disorder detection in pear fruit using explainable deep anomaly detection on X-ray images

Tim Van De Looverbosch, Jiaqi He, Astrid Tempelaere, Klaas Kelchtermans, Pieter Verboven, Tinne Tuytelaars, Jan Sijbers, Bart Nicolai

Summary: X-ray radiography has been investigated as a technique for internal quality inspection of pears in storage, with multiple deep anomaly detection methods showing effectiveness in detecting pears with internal cavity and browning disorders. The best performing methods were found to be on par with a state-of-the-art multisensor disorder detection method.

COMPUTERS AND ELECTRONICS IN AGRICULTURE (2022)

Article Computer Science, Artificial Intelligence

CLAD: A realistic Continual Learning benchmark for Autonomous Driving

Eli Verwimp, Kuo Yang, Sarah Parisot, Lanqing Hong, Steven McDonagh, Eduardo Perez-Pellitero, Matthias De Lange, Tinne Tuytelaars

Summary: In this paper, a new Continual Learning benchmark for Autonomous Driving (CLAD) is introduced, focusing on object classification and object detection problems. The benchmark utilizes SODA10M, a large-scale dataset related to autonomous driving. Existing continual learning benchmarks are reviewed and discussed, showing that most of them are extreme cases. Online classification benchmark CLAD-C and domain incremental continual object detection benchmark CLAD-D are introduced. The inherent difficulties and challenges are examined through a survey of top-3 participants in a CLAD-challenge workshop at ICCV 2021. Possible pathways to improve the current state of continual learning and promising directions for future research are discussed.

NEURAL NETWORKS (2023)

Article Agronomy

Synthetic data for X-ray CT of healthy and disordered pear fruit using deep learning

Astrid Tempelaere, Tim Van De Looverbosch, Klaas Kelchtermans, Pieter Verboven, Tinne Tuytelaars, Bart Nicolai

Summary: This study proposes a method to generate synthetic CT images using a conditional cGAN to overcome the challenges of obtaining large annotated datasets. The performance of the predictor was evaluated quantitatively and visually, showing that the cGAN effectively generated CT images of healthy and defective fruit based on annotations.

POSTHARVEST BIOLOGY AND TECHNOLOGY (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Spatial Consistency Loss for Training Multi-Label Classifiers from Single-Label Annotations

Thomas Verelst, Paul K. Rubenstein, Marcin Eichner, Tinne Tuytelaars, Maxim Berman

Summary: Multi-label image classification is more practical for real-world scenarios than single-label classification due to the presence of multiple objects in natural images. However, annotating every object of interest is time-consuming and expensive. In this study, we propose an Expected Negative loss to train multi-label classifiers using datasets where each image is annotated with a single positive label. To handle the uncertainty of other classes, we generate a set of expected negative labels based on prediction consistency. Additionally, we introduce a novel spatial consistency loss to improve supervision by maintaining consistent spatial feature maps for each training image. Our experiments on various datasets demonstrate the effectiveness of the Expected Negative loss in combination with consistency and spatial consistency losses, and we achieve improved multi-label classification mAP on ImageNet-1K using the ReaL multi-label validation set.

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) (2023)

Proceedings Paper Computer Science, Artificial Intelligence

SimGlim: Simplifying glimpse based active visual reconstruction

Abhishek Jha, Soroush Seifi, Tinne Tuytelaars

Summary: In active visual exploration, it is crucial to sample informative local observations for modeling global context. This paper proposes the use of vision transformers instead of CNNs for such agents and introduces a transformer-based active visual sampling model called SimGlim. The model utilizes the transformer's self-attention architecture to predict the best next location based on the current observable environment. Experimental results demonstrate the effectiveness of the proposed method in image reconstruction and comparisons against existing methods are provided. Ablation studies are also conducted to analyze the importance of design choices in the overall architecture.

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Barlow constrained optimization for Visual Question Answering

Abhishek Jha, Badri Patro, Luc Van Gool, Tinne Tuytelaars

Summary: This paper proposes a novel regularization method called COB to improve the information content of the joint space in visual question answering models. It reduces redundancy by minimizing the correlation between learned feature components, disentangling semantic concepts. The model aligns the joint space with the answer embedding space and shows improved accuracy on VQA datasets.

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Global-Local Self-Distillation for Visual Representation Learning

Tim Lebailly, Tinne Tuytelaars

Summary: The downstream accuracy of self-supervised methods depends on the proxy task and the quality of gradients extracted during training. Incorporating local cues in the proxy task can improve model accuracy on downstream tasks. We propose a geometric approach for matching local representations in self-distillation, which outperforms similarity-based methods, especially in low-data regimes. However, similarity-based matchings are highly detrimental to model performance in low-data regimes compared to the baseline without local self-distillation.

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Weakly Supervised Face Naming with Symmetry-Enhanced Contrastive Loss

Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

Summary: This paper revisits the weakly supervised cross-modal face-name alignment task and proposes SECLA and SECLA-B models. These models use appropriate loss functions to learn the alignments between names and faces in a neural network setting. SECLA maximizes the similarity scores between faces and names in a weakly supervised fashion, while SECLA-B learns to align names and faces from easy to hard cases, further improving the performance.

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) (2023)

Proceedings Paper Computer Science, Artificial Intelligence

CrOC : Cross-View Online Clustering for Dense Visual Representation Learning

Thomas Stegmuller, Tim Lebailly, Behzad Bozorgtabar, Tinne Tuytelaars, Jean-Philippe Thiran

Summary: In this paper, we propose a method for learning dense visual representations without labels by discovering and segmenting the semantics of views through an online clustering mechanism. The resulting method is highly generalizable and does not require cumbersome pre-processing steps.

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR (2023)

Proceedings Paper Computer Science, Artificial Intelligence

AUTOMATED VIRTUAL REDUCTION OF DISPLACED DISTAL RADIUS FRACTURES

J. Osstyn, F. Danckaers, A. Van Haver, J. Oramas, M. Vanhees, J. Sijbers

Summary: This article presents a fully automated algorithm for the reduction of displaced fractures, which is robust and closely resembles the manual reductions by surgeons.

2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI (2023)

Proceedings Paper Computer Science, Information Systems

AIMLAI: Advances in Interpretable Machine Learning and Artificial Intelligence

Adrien Bibal, Tassadit Bouadi, Benoit Frenay, Luis Galarraga, Jose Oramas

Summary: Recent technological advances rely on accurate decision support systems, but the lack of transparency due to complexity can lead to various issues, sparking the emergence of interpretable and explainable AI to address the problem of trust and bias in decision-making processes.

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022 (2022)

Article Computer Science, Artificial Intelligence

A Continual Learning Survey: Defying Forgetting in Classification Tasks

Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Greg Slabaugh, Tinne Tuytelaars

Summary: This article introduces the application of artificial neural networks in continual learning, focusing on task incremental classification. It proposes a new framework for continually evaluating the stability-plasticity trade-off of the network and performs experimental comparisons of 11 state-of-the-art continual learning methods, evaluating their strengths and weaknesses by considering different benchmark datasets.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

Article Computer Science, Artificial Intelligence

Effective Multimodal Encoding for Image Paragraph Captioning

Thanh-Son Nguyen, Basura Fernando

Summary: In this paper, a regularization-based image paragraph generation method is proposed. A novel multimodal encoding generator (MEG) is introduced to generate effective multimodal encoding that captures individual sentence, visual, and paragraph-sequential information. The generated encoding is utilized to regularize a paragraph generation model, leading to improved results in all evaluation metrics for the captioning model. The proposed MEG model, along with reinforcement learning optimization, achieves state-of-the-art results on the Stanford paragraph dataset. Extensive empirical analysis demonstrates the capabilities of MEG encoding, where qualitative visualization and multimodal sentence/image retrieval tasks show that MEG captures semantic and meaningful textual and visual information.

IEEE TRANSACTIONS ON IMAGE PROCESSING (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Deep Set Conditioned Latent Representations for Action Recognition

Akash Singh, Tom de Schepper, Kevin Mets, Peter Hellinckx, Jose Oramas, Steven Latre

Summary: In recent years, there has been increasing interest in multi-label, multi-class video action recognition. This paper proposes a method that learns to reason over the semantic concept of objects and actions using relational networks. The empirical results show that artificial neural networks benefit from pretraining, relational inductive biases, and unordered set-based latent representations in action recognition tasks.

PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5 (2022)

No Data Available