Article
Computer Science, Artificial Intelligence
Bruce X. B. Yu, Yan Liu, Xiang Zhang, Sheng-hua Zhong, Keith C. C. Chan
Summary: This article proposes a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities in order to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. Experimental results show that the proposed MMNet outperforms state-of-the-art approaches on five benchmark datasets, effectively capturing mutually complementary features in different RGB-D video modalities and providing more discriminative features for human action recognition.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Engineering, Electrical & Electronic
Yanli Ji, Yang Yang, Fumin Shen, Heng Tao Shen, Wei-Shi Zheng
Summary: Arbitrary-view action recognition remains a challenging problem due to view changes and visual occlusions. To address this issue, researchers have collected a large-scale RGB-D action dataset with diverse data types, rich action performances, and different viewpoints, providing valuable and challenging data for evaluating arbitrary-view recognition.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2021)
Review
Chemistry, Analytical
Muhammad Bilal Shaikh, Douglas Chai
Summary: This review focuses on data fusion and recognition techniques in the context of vision with an RGB-D perspective, highlighting the distinct characteristics of different action-data modalities. Research challenges, emerging trends, and possible future research directions are also discussed.
Article
Computer Science, Artificial Intelligence
Zixi Liang, Ming Yin, Junli Gao, Yicheng He, Weitian Huang
Summary: This paper proposes a View Knowledge Transfer Network (VKTNet) for multi-view action recognition, even when some views are incomplete. The view knowledge transferring is achieved using conditional generative adversarial network (cGAN), effectively extracting high-level semantic features and bridging the semantic gap between different views. Additionally, a Siamese Scaling Network (SSN) is proposed for efficiently fusing the decision results achieved by each view.
IMAGE AND VISION COMPUTING
(2022)
Article
Robotics
Kejie Li, Hamid Rezatofighi, Ian Reid
Summary: Semantic aware reconstruction provides more advantages for future robotic and AR/VR applications compared to geometric-only reconstruction, as it not only indicates the locations of objects, but also identifies the objects themselves. MOLTR is capable of localizing, tracking, and reconstructing multiple rigid objects using monocular image sequences and camera poses.
IEEE ROBOTICS AND AUTOMATION LETTERS
(2021)
Article
Computer Science, Information Systems
Razieh Rastgoo, Kourosh Kiani, Sergio Escalera
Summary: Gesture Recognition is a challenging research area in computer vision. To address the annotation bottleneck, the problem of Zero-Shot Gesture Recognition is formulated and a two-stream model is proposed. By leveraging Vision Transformer models for human detection and visual features representation, state-of-the-art results are achieved.
MULTIMEDIA TOOLS AND APPLICATIONS
(2023)
Article
Multidisciplinary Sciences
Nouar AlDahoul, Hezerul Abdul Karim, Mhd Adel Momo
Summary: Recognition of space objects is crucial for space situational awareness, and this paper proposes a multi-modal deep learning solution to address the complex recognition task in space imagery. By using various deep learning models, the proposed solution achieves higher performance and better results in terms of accuracy, precision, recall, and F1 score.
SCIENTIFIC REPORTS
(2022)
Article
Computer Science, Information Systems
Hoang-Nhat Tran, Hong-Quan Nguyen, Huong-Giang Doan, Thanh-Hai Tran, Thi-Lan Le, Hai Vu
Summary: This paper presents a novel method for Human Action Recognition (HAR) under different camera viewpoints using deep learning techniques, achieving robust performance across various datasets, especially for harder classes. The proposed pc-MvDA approach constructs a common feature space to maintain view-invariant features among separated camera views, leading to consistent performance gains in experimental results.
Article
Computer Science, Artificial Intelligence
Qun Li, Rui Yang, Fu Xiao, Bir Bhanu, Feng Zhang
Summary: This paper proposes a method for anomaly detection using future frame prediction framework and Multiple Instance Learning framework, with introduction of memory addressing module and novel loss function. A multi-view dataset containing various anomalies and normal activities was also introduced, and experimental results demonstrate the effectiveness of the methods on multiple datasets.
KNOWLEDGE-BASED SYSTEMS
(2022)
Article
Computer Science, Information Systems
Yi Huang, Xiaoshan Yang, Junyun Gao, Changsheng Xu
Summary: This paper proposes a method to solve the multi-domain action recognition task of egocentric-exocentric videos by transferring knowledge between the two domains to learn a single model. It maps videos to a global feature space and combines view-invariant and view-specific visual knowledge.
IEEE TRANSACTIONS ON MULTIMEDIA
(2022)
Article
Engineering, Electrical & Electronic
Dinghao Fan, Hengjie Lu, Shugong Xu, Shan Cao
Summary: This study introduces an end-to-end multi-task learning framework that utilizes depth modality to enhance the accuracy of gesture recognition. Experimental results demonstrate that the proposed method outperforms existing gesture recognition frameworks on three public datasets, and also achieves excellent accuracy improvement when applied to other 2D CNN-based frameworks.
IEEE SENSORS JOURNAL
(2021)
Article
Computer Science, Artificial Intelligence
Amin Ullah, Khan Muhammad, Tanveer Hussain, Sung Wook Baik
Summary: The paper introduces a conflux long short-term memory (LSTMs) network for action recognition from multi-view cameras. By utilizing four major steps, the framework successfully extracts features from different views for effective action recognition, resulting in performance improvement in experimental results.
Article
Engineering, Electrical & Electronic
Qiang Wang, Gan Sun, Jiahua Dong, Qianqian Wang, Zhengming Ding
Summary: This paper proposes a lifelong multi-view subspace learning framework for continuous human action recognition, which utilizes complementary information among different views and achieves superior performance on new action recognition tasks.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
(2022)
Article
Computer Science, Information Systems
Dakai Ren, Xiangmin Wen, Jiazhong Chen, Yu Han, Shiqi Zhang
Summary: Two novel neural network architectures, AUCaps and AUCaps++, are proposed for multi-view and multi-label facial action unit (AU) detection by optimizing the combination of CapsNets and dense blocks. The proposed method outperforms competitors in terms of F1 scores on both within-dataset and cross-dataset evaluations.
MULTIMEDIA TOOLS AND APPLICATIONS
(2022)
Article
Computer Science, Artificial Intelligence
Yancheng Wang, Yang Xiao, Junyi Lu, Bo Tan, Zhiguo Cao, Zhenjun Zhang, Joey Tianyi Zhou
Summary: The article addresses the challenge of dramatic imaging viewpoint variation for action recognition in depth video, proposing a discriminative MVDI fusion method via multi-instance learning to enhance cross-view 3-D action recognition performance. The method emphasizes enhancing view-tolerance of visual features and utilizing Fisher vector for better discriminative power.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2022)
Article
Computer Science, Artificial Intelligence
Haoxin Li, Wei-Shi Zheng, Jianguo Zhang, Haifeng Hu, Jiwen Lu, Jian-Huang Lai
Summary: This study proposes a weakly supervised model for egocentric action recognition, which automatically localizes interactors and establishes explicit relation models for recognition without using annotations or prior knowledge. Extensive experiments on egocentric video datasets demonstrate the effectiveness of the proposed method.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Wenzhao Zheng, Jiwen Lu, Jie Zhou
Summary: This paper proposes a deep metric learning method called DML-DC, which utilizes adaptively generated dynamic constraints for image retrieval and clustering. The method employs a learnable constraint generator to produce dynamic constraints and trains the metric towards better generalization. It formulates the deep metric learning objective under a proxy collection, pair sampling, tuple construction, and tuple weighting paradigm.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Ziwei Wang, Han Xiao, Yueqi Duan, Jie Zhou, Jiwen Lu
Summary: In this paper, we propose a GraphBit method for learning unsupervised deep binary descriptors to efficiently represent images. The method reduces the uncertainty of binary codes by maximizing the mutual information with input and related bits, allowing reliable binarization of ambiguous bits. Additionally, a differentiable search method called GraphBit+ is introduced to mine bitwise interaction in continuous space, reducing the computational cost of reinforcement learning. To address the issue of inaccurate instructions from fixed bitwise interaction, the unsupervised binary descriptor learning method D-GraphBit is proposed, which utilizes a graph convolutional network to reason the optimal bitwise interaction for each input sample.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Mantang Guo, Junhui Hou, Jing Jin, Hui Liu, Huanqiang Zeng, Jiwen Lu
Summary: This paper proposes a content-aware warping method that adaptsively learns the interpolation weights for pixels from their contextual information via a lightweight neural network. Based on this learnable warping module, a new end-to-end learning-based framework is proposed for novel view synthesis, which includes two additional modules to address occlusion and spatial correlation issues. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods both quantitatively and visually.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Yi Wei, Shaohui Liu, Jie Zhou, Jiwen Lu
Summary: In this work, a new multi-view depth estimation method called NerfingMVS is presented, which combines conventional reconstruction and learning-based priors with neural radiance fields (NeRF). It directly optimizes over implicit volumes, eliminating the need for pixel matching in indoor scenes. The key is using learning-based priors to guide the optimization process of NeRF. The proposed method achieves state-of-the-art performances and improves rendering quality on both seen and novel views.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Yongming Rao, Wenliang Zhao, Zheng Zhu, Jie Zhou, Jiwen Lu
Summary: We present GFNet, a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain. GFNet outperforms Transformer-based models and CNNs in terms of efficiency, generalization ability, and robustness. We provide a series of isotropic and hierarchical models based on GFNet design.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Hui Li, Tianyang Xu, Xiao-Jun Wu, Jiwen Lu, Josef Kittler
Summary: Deep learning based fusion methods have achieved promising performance in image fusion tasks due to the importance of network architecture. However, designing fusion networks is still a challenging task. In this paper, the fusion task is mathematically formulated and a connection between the optimal solution and network architecture is established. This leads to the proposal of a lightweight fusion network based on a learnable representation approach.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie Zhou, Jiwen Lu
Summary: In this paper, a new approach for model acceleration by exploiting spatial sparsity in visual data is presented. A dynamic token sparsification framework is proposed, which prunes redundant tokens progressively and dynamically based on the input to accelerate vision Transformers. The framework extends to hierarchical models and more complex dense prediction tasks, offering a new and more effective dimension for model acceleration. Promising results are achieved on various architectures and visual tasks, demonstrating the effectiveness of the proposed framework.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Information Systems
Bin Fan, Yuzhu Yang, Wensen Feng, Fuchao Wu, Jiwen Lu, Hongmin Liu
Summary: This paper proposes an adversarial learning based solution to extract robust local features and descriptions across day-night images. By training a discriminator to distinguish day and night images and adjusting the feature extraction network to fool the discriminator, the network can extract domain invariant keypoints and descriptors. Compared to existing methods, this approach only requires additional easily captured night images to improve the domain invariance of learned features.
IEEE TRANSACTIONS ON MULTIMEDIA
(2023)
Article
Computer Science, Artificial Intelligence
Ziwei Wang, Changyuan Wang, Xiuwei Xu, Jie Zhou, Jiwen Lu
Summary: In this article, the authors propose Quantformer, a type of extremely low-precision vision transformers for efficient inference. They address the limitations of conventional network quantization methods by considering the properties of transformer architectures and implementing capacity-aware distribution and group-wise discretization strategies. Experimental results show that Quantformer outperforms state-of-the-art methods in image classification and object detection across various vision transformer architectures. The authors also integrate Quantformer with mixed-precision quantization to further enhance performance.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Theory & Methods
Yongjie Duan, Jianjiang Feng, Jiwen Lu, Jie Zhou
Summary: In this study, a fusion of voting strategy and deep network is proposed to estimate fingerprint center and direction. Experimental results show that this approach can achieve consistent fingerprint pose estimations, improve performance of fingerprint indexing and verification, and be robust to different sensing technologies and impression types.
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY
(2023)
Article
Computer Science, Artificial Intelligence
Shuai Shen, Wanhua Li, Zheng Zhu, Jie Zhou, Jiwen Lu
Summary: This paper proposes a new face clustering method, called STructure-AwaRe Face Clustering (STAR-FC), which addresses the dilemma of large-scale training and efficient inference by designing a structure-preserving subgraph sampling strategy and a novel hierarchical GCN training paradigm. During inference, the STAR-FC performs efficient full-graph clustering with two steps: graph parsing and graph refinement, and introduces the concept of node intimacy to mine the local structural information. The experimental results demonstrate that this method achieves superior performance and efficiency.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Ziyi Wang, Yi Wei, Yongming Rao, Jie Zhou, Jiwen Lu
Summary: This paper proposes a Point-Voxel Correlation Fields method to explore the relations between two consecutive point clouds and estimate scene flow representing 3D motions. By introducing all-pair correlation volumes and using distinct point and voxel branches to handle local and long-range correlations, the proposed method outperforms state-of-the-art methods in experiments.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Information Systems
Hongyi Sun, Wanhua Li, Yueqi Duan, Jie Zhou, Jiwen Lu
Summary: In this paper, a Mask-Robust Inpainting Network (MRIN) approach is proposed to recover the masked areas of an image. By decomposing complex mask areas into different types and using type-specific generators for inpainting, the method achieves effective restoration for various masks.
IEEE TRANSACTIONS ON MULTIMEDIA
(2023)
Article
Computer Science, Artificial Intelligence
Haotong Qin, Yifu Ding, Xiangguo Zhang, Jiakai Wang, Xianglong Liu, Jiwen Lu
Summary: Generative data-free quantization is a compression approach that quantizes deep neural networks to low bit-width without accessing the real data. This paper presents a generic Diverse Sample Generation (DSG) scheme to mitigate the accuracy degradation issue through generating diverse synthetic samples.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)