4.7 Article

Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2018.2875441

关键词

Egocentric action recognition; RGB-D videos; multi-view learning; deep learning

资金

  1. National Key Research and Development Program of China [2017YFA0700802]
  2. National Natural Science Foundation of China [61822603, U1713214, 61672306, 61572271, 61527808]
  3. National 1000 Young Talents Plan Program
  4. Shenzhen Fundamental Research Fund (Subject Arrangement) [JCYJ20170412170602564]

向作者/读者索取更多资源

In this paper, we investigate the problem of RGB-D egocentric action recognition. Unlike conventional human action videos that are passively recorded by static cameras, egocentric videos are self-generated from wearable sensors that are more flexible and provide the close-ups with the visual attention of the wearers when they act. Moreover, RGB-D videos contain the spatial appearance and temporal information in the RGB modality and reflect the 3D structure of the scenes in the depth modality. To adequately learn the nonlinear structure of heterogeneous representations from different modalities and exploit their complementary characteristics, we develop a multi-stream deep neural networks (MDNN) method, which aims to preserve the distinctive property for each modality and simultaneously explore their sharable information in a unified deep architecture. Specifically, we deploy a Cauchy estimator to maximize the correlations of the sharable components and enforce the orthogonality constraints on the distinctive components to guarantee their high independencies. Since the egocentric action recognition is usually sensitive to hand poses, we extend our MDNN by integrating with the hand cues to enhance the recognition accuracy. Extensive experimental results on a newly collected data set and two additional benchmarks are presented to demonstrate the effectiveness of our proposed method for RGB-D egocentric action recognition.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

Egocentric Action Recognition by Automatic Relation Modeling

Haoxin Li, Wei-Shi Zheng, Jianguo Zhang, Haifeng Hu, Jiwen Lu, Jian-Huang Lai

Summary: This study proposes a weakly supervised model for egocentric action recognition, which automatically localizes interactors and establishes explicit relation models for recognition without using annotations or prior knowledge. Extensive experiments on egocentric video datasets demonstrate the effectiveness of the proposed method.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Deep Metric Learning With Adaptively Composite Dynamic Constraints

Wenzhao Zheng, Jiwen Lu, Jie Zhou

Summary: This paper proposes a deep metric learning method called DML-DC, which utilizes adaptively generated dynamic constraints for image retrieval and clustering. The method employs a learnable constraint generator to produce dynamic constraints and trains the metric towards better generalization. It formulates the deep metric learning objective under a proxy collection, pair sampling, tuple construction, and tuple weighting paradigm.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Learning Deep Binary Descriptors via Bitwise Interaction Mining

Ziwei Wang, Han Xiao, Yueqi Duan, Jie Zhou, Jiwen Lu

Summary: In this paper, we propose a GraphBit method for learning unsupervised deep binary descriptors to efficiently represent images. The method reduces the uncertainty of binary codes by maximizing the mutual information with input and related bits, allowing reliable binarization of ambiguous bits. Additionally, a differentiable search method called GraphBit+ is introduced to mine bitwise interaction in continuous space, reducing the computational cost of reinforcement learning. To address the issue of inaccurate instructions from fixed bitwise interaction, the unsupervised binary descriptor learning method D-GraphBit is proposed, which utilizes a graph convolutional network to reason the optimal bitwise interaction for each input sample.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Content-Aware Warping for View Synthesis

Mantang Guo, Junhui Hou, Jing Jin, Hui Liu, Huanqiang Zeng, Jiwen Lu

Summary: This paper proposes a content-aware warping method that adaptsively learns the interpolation weights for pixels from their contextual information via a lightweight neural network. Based on this learnable warping module, a new end-to-end learning-based framework is proposed for novel view synthesis, which includes two additional modules to address occlusion and spatial correlation issues. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods both quantitatively and visually.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Depth-Guided Optimization of Neural Radiance Fields for Indoor Multi-View Stereo

Yi Wei, Shaohui Liu, Jie Zhou, Jiwen Lu

Summary: In this work, a new multi-view depth estimation method called NerfingMVS is presented, which combines conventional reconstruction and learning-based priors with neural radiance fields (NeRF). It directly optimizes over implicit volumes, eliminating the need for pixel matching in indoor scenes. The key is using learning-based priors to guide the optimization process of NeRF. The proposed method achieves state-of-the-art performances and improves rendering quality on both seen and novel views.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

GFNet: Global Filter Networks for Visual Recognition

Yongming Rao, Wenliang Zhao, Zheng Zhu, Jie Zhou, Jiwen Lu

Summary: We present GFNet, a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain. GFNet outperforms Transformer-based models and CNNs in terms of efficiency, generalization ability, and robustness. We provide a series of isotropic and hierarchical models based on GFNet design.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images

Hui Li, Tianyang Xu, Xiao-Jun Wu, Jiwen Lu, Josef Kittler

Summary: Deep learning based fusion methods have achieved promising performance in image fusion tasks due to the importance of network architecture. However, designing fusion networks is still a challenging task. In this paper, the fusion task is mathematically formulated and a connection between the optimal solution and network architecture is established. This leads to the proposal of a lightweight fusion network based on a learnable representation approach.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks

Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie Zhou, Jiwen Lu

Summary: In this paper, a new approach for model acceleration by exploiting spatial sparsity in visual data is presented. A dynamic token sparsification framework is proposed, which prunes redundant tokens progressively and dynamically based on the input to accelerate vision Transformers. The framework extends to hierarchical models and more complex dense prediction tasks, offering a new and more effective dimension for model acceleration. Promising results are achieved on various architectures and visual tasks, demonstrating the effectiveness of the proposed framework.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Information Systems

Seeing Through Darkness: Visual Localization at Night via Weakly Supervised Learning of Domain Invariant Features

Bin Fan, Yuzhu Yang, Wensen Feng, Fuchao Wu, Jiwen Lu, Hongmin Liu

Summary: This paper proposes an adversarial learning based solution to extract robust local features and descriptions across day-night images. By training a discriminator to distinguish day and night images and adjusting the feature extraction network to fool the discriminator, the network can extract domain invariant keypoints and descriptors. Compared to existing methods, this approach only requires additional easily captured night images to improve the domain invariance of learned features.

IEEE TRANSACTIONS ON MULTIMEDIA (2023)

Article Computer Science, Artificial Intelligence

Quantformer: Learning Extremely Low-Precision Vision Transformers

Ziwei Wang, Changyuan Wang, Xiuwei Xu, Jie Zhou, Jiwen Lu

Summary: In this article, the authors propose Quantformer, a type of extremely low-precision vision transformers for efficient inference. They address the limitations of conventional network quantization methods by considering the properties of transformer architectures and implementing capacity-aware distribution and group-wise discretization strategies. Experimental results show that Quantformer outperforms state-of-the-art methods in image classification and object detection across various vision transformer architectures. The authors also integrate Quantformer with mixed-precision quantization to further enhance performance.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Theory & Methods

Estimating Fingerprint Pose via Dense Voting

Yongjie Duan, Jianjiang Feng, Jiwen Lu, Jie Zhou

Summary: In this study, a fusion of voting strategy and deep network is proposed to estimate fingerprint center and direction. Experimental results show that this approach can achieve consistent fingerprint pose estimations, improve performance of fingerprint indexing and verification, and be robust to different sensing technologies and impression types.

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY (2023)

Article Computer Science, Artificial Intelligence

STAR-FC: Structure-Aware Face Clustering on Ultra-Large-Scale Graphs

Shuai Shen, Wanhua Li, Zheng Zhu, Jie Zhou, Jiwen Lu

Summary: This paper proposes a new face clustering method, called STructure-AwaRe Face Clustering (STAR-FC), which addresses the dilemma of large-scale training and efficient inference by designing a structure-preserving subgraph sampling strategy and a novel hierarchical GCN training paradigm. During inference, the STAR-FC performs efficient full-graph clustering with two steps: graph parsing and graph refinement, and introduces the concept of node intimacy to mine the local structural information. The experimental results demonstrate that this method achieves superior performance and efficiency.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Artificial Intelligence

3D Point-Voxel Correlation Fields for Scene Flow Estimation

Ziyi Wang, Yi Wei, Yongming Rao, Jie Zhou, Jiwen Lu

Summary: This paper proposes a Point-Voxel Correlation Fields method to explore the relations between two consecutive point clouds and estimate scene flow representing 3D motions. By introducing all-pair correlation volumes and using distinct point and voxel branches to handle local and long-range correlations, the proposed method outperforms state-of-the-art methods in experiments.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Computer Science, Information Systems

Learning Adaptive Patch Generators for Mask-Robust Image Inpainting

Hongyi Sun, Wanhua Li, Yueqi Duan, Jie Zhou, Jiwen Lu

Summary: In this paper, a Mask-Robust Inpainting Network (MRIN) approach is proposed to recover the masked areas of an image. By decomposing complex mask areas into different types and using type-specific generators for inpainting, the method achieves effective restoration for various masks.

IEEE TRANSACTIONS ON MULTIMEDIA (2023)

Article Computer Science, Artificial Intelligence

Diverse Sample Generation: Pushing the Limit of Generative Data-Free Quantization

Haotong Qin, Yifu Ding, Xiangguo Zhang, Jiakai Wang, Xianglong Liu, Jiwen Lu

Summary: Generative data-free quantization is a compression approach that quantizes deep neural networks to low bit-width without accessing the real data. This paper presents a generic Diverse Sample Generation (DSG) scheme to mitigate the accuracy degradation issue through generating diverse synthetic samples.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

暂无数据