Article
Computer Science, Artificial Intelligence
Ziliang Ren, Qieshi Zhang, Jun Cheng, Fusheng Hao, Xiangyang Gao
Summary: This paper proposes a novel approach for multimodal human action recognition by learning complementary features from RGB-D sequence, compressing the sequence into dynamic images, and designing SC-ConvNets to learn complementary features from different modalities. Experimental results demonstrate excellent recognition performance across multiple datasets.
Article
Computer Science, Artificial Intelligence
Jiaming Wang, Zhenfeng Shao, Xiao Huang, Tao Lu, Ruiqian Zhang, Xianwei Lv
Summary: The study introduces a novel parameter-free spatial-temporal pooling block (STP) for action recognition in videos, which efficiently discards non-informative frames, learns spatial and temporal weights, and uses a new loss function to enforce the model to learn information from sparse and discriminative frames, ultimately outperforming several state-of-the-art methods in action classification.
Article
Mathematics
Qingxia Li, Dali Gao, Qieshi Zhang, Wenhong Wei, Ziliang Ren
Summary: This paper proposes a method to improve action recognition performance by constructing dynamic images and designing an interactive learning dual-ConvNet (ILD-ConvNet). The constructed visual dynamic images capture spatial-temporal information using the rank pooling method and extend to depth sequences for more abundant multi-modal spatial-temporal information. The proposed ILD-ConvNet achieves competitive recognition accuracy on NTU RGB + D 120 and PKU-MMD datasets.
Article
Computer Science, Information Systems
Ziliang Ren, Qieshi Zhang, Xiangyang Gao, Pengyi Hao, Jun Cheng
Summary: The paper introduces a multi-modality learning approach for human action recognition, which utilizes bidirectional rank pooling to obtain spatial-temporal information from RGB and depth images, and designs an effective ConvNets architecture based on multi-modality hierarchical fusion strategy. The proposed method achieves state-of-the-art results on multiple datasets.
MULTIMEDIA TOOLS AND APPLICATIONS
(2021)
Article
Computer Science, Artificial Intelligence
Dengdi Sun, Zhixiang Su, Zhuanlian Ding, Bin Luo
Summary: The proposed action recognition model based on a multi-view temporal attention mechanism effectively captures and utilizes motion information present in image frames and optical flows. Experimental results demonstrate that the method outperforms existing techniques in action recognition, showcasing the effectiveness of introducing temporal attention and multi-view fusion approaches.
COGNITIVE COMPUTATION
(2022)
Article
Multidisciplinary Sciences
Guoan Yang, Yong Yang, Zhengzhi Lu, Junjie Yang, Deyang Liu, Chuanbo Zhou, Zien Fan
Summary: This study addresses the issue of deep learning-based action recognition models focusing only on short-term motions and causing misjudgments of actions combined by multiple processes. It proposes a Spatial-Temporal Attention Temporal Segment Networks (STA-TSN) model that incorporates a soft attention mechanism to adaptively focus on key spatial and temporal features. By combining a multi-scale spatial focus feature enhancement strategy and a deep learning-based key frames exploration module, the model captures long-term information and key frames more effectively, achieving superior results compared to existing methods on public datasets.
Article
Computer Science, Artificial Intelligence
Fei Wang, Guorui Wang, Yuxuan Du, Zhenquan He, Yong Jiang
Summary: This paper introduces a two-stage temporal proposal algorithm for the action detection task in long untrimmed videos. The algorithm utilizes a novel prior-minor watershed and sliding window approach in the first stage, and an extended context pooling (ECP) and temporal context regression network in the second stage to improve the precision of action localization. Results on three large scale benchmarks demonstrate that the proposed method outperforms state-of-the-art approaches and runs efficiently on a GPU.
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS
(2021)
Article
Computer Science, Artificial Intelligence
Zhenwei Wang, Wei Dong, Bingbing Zhang, Jianxin Zhang, Xiangdong Liu, Bin Liu, Qiang Zhang
Summary: In video action recognition, the proposed GSoANet integrates the GSoAM at the end of the network to aggregate spatio-temporal features. GSoAM decomposes input features into low-dimensional vectors and aggregates video spatio-temporal features. The network also introduces ConvNeXt as a backbone to improve accuracy at a lower computational cost.
NEURAL PROCESSING LETTERS
(2023)
Article
Computer Science, Artificial Intelligence
Qilong Wang, Qiyao Hu, Zilin Gao, Peihua Li, Qinghua Hu
Summary: This article proposes an adaptive multi-granularity spatio-temporal network (AMS-Net) for effectively handling complex scale variations in videos. The network is capable of capturing subtle variations in visual tempos and fair-sized spatio-temporal dynamics in an efficient manner, achieving state-of-the-art performance on action recognition tasks.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Article
Computer Science, Artificial Intelligence
Huigang Zhang, Liuan Wang, Jun Sun
Summary: Action recognition is a popular area of computer vision research, focusing on identifying human actions in videos. Existing methods rely on visual features within the videos, but lack the ability to represent general knowledge of actions beyond the video. This study presents a novel spatio-temporal knowledge module (STKM) that combines external knowledge with visual features, leading to improved recognition results. Experimental results demonstrate the robustness and generalization ability of STKM.
IET COMPUTER VISION
(2023)
Article
Engineering, Electrical & Electronic
Kun Liu, Wu Liu, Huadong Ma, Mingkui Tan, Chuang Gan
Summary: A new real-time convolutional architecture T-C3D is proposed for action representation, which combines deep compression techniques to accelerate model deployment. By studying action representation performance, the method achieves a 5.4% improvement in accuracy and is 2 times faster in terms of inference speed compared to state-of-the-art real-time methods, with a model size of less than 5MB.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2021)
Article
Computer Science, Information Systems
Lijun He, Miao Zhang, Sijin Zhang, Liejun Wang, Fan Li
Summary: With the wide deployment of Internet of Things monitoring terminals, a tremendous amount of videos are being accumulated. This paper introduces a method that utilizes the compressed domain to efficiently extract video information and recognize actions with different durations based on multiscale temporal features. The experimental results show that the proposed algorithm achieves a good balance between accuracy and computational complexity.
IEEE INTERNET OF THINGS JOURNAL
(2022)
Article
Computer Science, Artificial Intelligence
Jun Tang, Baodi Liu, Wenhui Guo, Yanjiang Wang
Summary: This paper introduces a skeleton-based action recognition method that effectively utilizes the information of feature distributions by incorporating Fisher vector encoding into graph convolutional networks (GCNs). A temporal enhanced Fisher vector encoding algorithm is proposed to capture both temporal information and fine-grained spatial configurations and temporal dynamics. The performance is further improved by combining the TEFV model with the GCN model in a two-stream framework.
COMPLEX & INTELLIGENT SYSTEMS
(2023)
Article
Computer Science, Information Systems
Peng Dou, Ying Zeng, Zhuoqun Wang, Haifeng Hu
Summary: Recent action localization works learn in a weakly supervised manner to avoid the expensive cost of human labeling. To solve the problem of weak discriminative foreground action segments and the background ones, as well as the relationship between different actions, we propose multiple temporal pooling mechanisms (MTP) that leverage more effective information and generate different Class Activation Sequences (CASs). Our method achieves excellent results on the THUMOS14 and ActivityNet1.2 datasets.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2023)
Article
Computer Science, Information Systems
Md Moniruzzaman, Zhaozheng Yin, Zhihai He, Ruwen Qin, Ming C. Leu
Summary: This study introduces a simple yet effective network model for human action recognition from trimmed and untrimmed videos. By introducing attentional pooling mechanism and video segment attention model, the network can emphasize critical features related to actions in videos and learn attention weights even without precise temporal annotations. Experimental results on multiple datasets demonstrate the superior performance of the network compared to the state-of-the-art methods.
IEEE TRANSACTIONS ON MULTIMEDIA
(2022)
Article
Computer Science, Artificial Intelligence
Yu Liu, Tinne Tuytelaars
Summary: Discovering novel visual categories from unlabeled images is crucial for intelligent vision systems, and we propose a residual-tuning approach to overcome the tradeoff between preserving features on labeled data and adapting features on unlabeled data. Our method achieves consistent and considerable gains on benchmark tests, reducing the performance gap to fully supervised learning setup.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Article
Agriculture, Multidisciplinary
Tim Van De Looverbosch, Jiaqi He, Astrid Tempelaere, Klaas Kelchtermans, Pieter Verboven, Tinne Tuytelaars, Jan Sijbers, Bart Nicolai
Summary: X-ray radiography has been investigated as a technique for internal quality inspection of pears in storage, with multiple deep anomaly detection methods showing effectiveness in detecting pears with internal cavity and browning disorders. The best performing methods were found to be on par with a state-of-the-art multisensor disorder detection method.
COMPUTERS AND ELECTRONICS IN AGRICULTURE
(2022)
Article
Computer Science, Artificial Intelligence
Eli Verwimp, Kuo Yang, Sarah Parisot, Lanqing Hong, Steven McDonagh, Eduardo Perez-Pellitero, Matthias De Lange, Tinne Tuytelaars
Summary: In this paper, a new Continual Learning benchmark for Autonomous Driving (CLAD) is introduced, focusing on object classification and object detection problems. The benchmark utilizes SODA10M, a large-scale dataset related to autonomous driving. Existing continual learning benchmarks are reviewed and discussed, showing that most of them are extreme cases. Online classification benchmark CLAD-C and domain incremental continual object detection benchmark CLAD-D are introduced. The inherent difficulties and challenges are examined through a survey of top-3 participants in a CLAD-challenge workshop at ICCV 2021. Possible pathways to improve the current state of continual learning and promising directions for future research are discussed.
Article
Agronomy
Astrid Tempelaere, Tim Van De Looverbosch, Klaas Kelchtermans, Pieter Verboven, Tinne Tuytelaars, Bart Nicolai
Summary: This study proposes a method to generate synthetic CT images using a conditional cGAN to overcome the challenges of obtaining large annotated datasets. The performance of the predictor was evaluated quantitatively and visually, showing that the cGAN effectively generated CT images of healthy and defective fruit based on annotations.
POSTHARVEST BIOLOGY AND TECHNOLOGY
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Thomas Verelst, Paul K. Rubenstein, Marcin Eichner, Tinne Tuytelaars, Maxim Berman
Summary: Multi-label image classification is more practical for real-world scenarios than single-label classification due to the presence of multiple objects in natural images. However, annotating every object of interest is time-consuming and expensive. In this study, we propose an Expected Negative loss to train multi-label classifiers using datasets where each image is annotated with a single positive label. To handle the uncertainty of other classes, we generate a set of expected negative labels based on prediction consistency. Additionally, we introduce a novel spatial consistency loss to improve supervision by maintaining consistent spatial feature maps for each training image. Our experiments on various datasets demonstrate the effectiveness of the Expected Negative loss in combination with consistency and spatial consistency losses, and we achieve improved multi-label classification mAP on ImageNet-1K using the ReaL multi-label validation set.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Abhishek Jha, Soroush Seifi, Tinne Tuytelaars
Summary: In active visual exploration, it is crucial to sample informative local observations for modeling global context. This paper proposes the use of vision transformers instead of CNNs for such agents and introduces a transformer-based active visual sampling model called SimGlim. The model utilizes the transformer's self-attention architecture to predict the best next location based on the current observable environment. Experimental results demonstrate the effectiveness of the proposed method in image reconstruction and comparisons against existing methods are provided. Ablation studies are also conducted to analyze the importance of design choices in the overall architecture.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Abhishek Jha, Badri Patro, Luc Van Gool, Tinne Tuytelaars
Summary: This paper proposes a novel regularization method called COB to improve the information content of the joint space in visual question answering models. It reduces redundancy by minimizing the correlation between learned feature components, disentangling semantic concepts. The model aligns the joint space with the answer embedding space and shows improved accuracy on VQA datasets.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Tim Lebailly, Tinne Tuytelaars
Summary: The downstream accuracy of self-supervised methods depends on the proxy task and the quality of gradients extracted during training. Incorporating local cues in the proxy task can improve model accuracy on downstream tasks. We propose a geometric approach for matching local representations in self-distillation, which outperforms similarity-based methods, especially in low-data regimes. However, similarity-based matchings are highly detrimental to model performance in low-data regimes compared to the baseline without local self-distillation.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens
Summary: This paper revisits the weakly supervised cross-modal face-name alignment task and proposes SECLA and SECLA-B models. These models use appropriate loss functions to learn the alignments between names and faces in a neural network setting. SECLA maximizes the similarity scores between faces and names in a weakly supervised fashion, while SECLA-B learns to align names and faces from easy to hard cases, further improving the performance.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
Thomas Stegmuller, Tim Lebailly, Behzad Bozorgtabar, Tinne Tuytelaars, Jean-Philippe Thiran
Summary: In this paper, we propose a method for learning dense visual representations without labels by discovering and segmenting the semantics of views through an online clustering mechanism. The resulting method is highly generalizable and does not require cumbersome pre-processing steps.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR
(2023)
Proceedings Paper
Computer Science, Artificial Intelligence
J. Osstyn, F. Danckaers, A. Van Haver, J. Oramas, M. Vanhees, J. Sijbers
Summary: This article presents a fully automated algorithm for the reduction of displaced fractures, which is robust and closely resembles the manual reductions by surgeons.
2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI
(2023)
Proceedings Paper
Computer Science, Information Systems
Adrien Bibal, Tassadit Bouadi, Benoit Frenay, Luis Galarraga, Jose Oramas
Summary: Recent technological advances rely on accurate decision support systems, but the lack of transparency due to complexity can lead to various issues, sparking the emergence of interpretable and explainable AI to address the problem of trust and bias in decision-making processes.
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022
(2022)
Article
Computer Science, Artificial Intelligence
Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Greg Slabaugh, Tinne Tuytelaars
Summary: This article introduces the application of artificial neural networks in continual learning, focusing on task incremental classification. It proposes a new framework for continually evaluating the stability-plasticity trade-off of the network and performs experimental comparisons of 11 state-of-the-art continual learning methods, evaluating their strengths and weaknesses by considering different benchmark datasets.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2022)
Article
Computer Science, Artificial Intelligence
Thanh-Son Nguyen, Basura Fernando
Summary: In this paper, a regularization-based image paragraph generation method is proposed. A novel multimodal encoding generator (MEG) is introduced to generate effective multimodal encoding that captures individual sentence, visual, and paragraph-sequential information. The generated encoding is utilized to regularize a paragraph generation model, leading to improved results in all evaluation metrics for the captioning model. The proposed MEG model, along with reinforcement learning optimization, achieves state-of-the-art results on the Stanford paragraph dataset. Extensive empirical analysis demonstrates the capabilities of MEG encoding, where qualitative visualization and multimodal sentence/image retrieval tasks show that MEG captures semantic and meaningful textual and visual information.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Akash Singh, Tom de Schepper, Kevin Mets, Peter Hellinckx, Jose Oramas, Steven Latre
Summary: In recent years, there has been increasing interest in multi-label, multi-class video action recognition. This paper proposes a method that learns to reason over the semantic concept of objects and actions using relational networks. The empirical results show that artificial neural networks benefit from pretraining, relational inductive biases, and unordered set-based latent representations in action recognition tasks.
PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5
(2022)