☆ 4.7 Article

Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2019)

期刊

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

卷 29, 期 10, 页码 3001-3015

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2018.2875441

关键词

Egocentric action recognition; RGB-D videos; multi-view learning; deep learning

类别

Engineering, Electrical & Electronic

资金

National Key Research and Development Program of China [2017YFA0700802]
National Natural Science Foundation of China [61822603, U1713214, 61672306, 61572271, 61527808]
National 1000 Young Talents Plan Program
Shenzhen Fundamental Research Fund (Subject Arrangement) [JCYJ20170412170602564]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, we investigate the problem of RGB-D egocentric action recognition. Unlike conventional human action videos that are passively recorded by static cameras, egocentric videos are self-generated from wearable sensors that are more flexible and provide the close-ups with the visual attention of the wearers when they act. Moreover, RGB-D videos contain the spatial appearance and temporal information in the RGB modality and reflect the 3D structure of the scenes in the depth modality. To adequately learn the nonlinear structure of heterogeneous representations from different modalities and exploit their complementary characteristics, we develop a multi-stream deep neural networks (MDNN) method, which aims to preserve the distinctive property for each modality and simultaneously explore their sharable information in a unified deep architecture. Specifically, we deploy a Cauchy estimator to maximize the correlations of the sharable components and enforce the orthogonality constraints on the distinctive components to guarantee their high independencies. Since the egocentric action recognition is usually sensitive to hand poses, we extend our MDNN by integrating with the hand cues to enhance the recognition accuracy. Extensive experimental results on a newly collected data set and two additional benchmarks are presented to demonstrate the effectiveness of our proposed method for RGB-D egocentric action recognition.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.7

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos

Bruce X. B. Yu, Yan Liu, Xiang Zhang, Sheng-hua Zhong, Keith C. C. Chan

Summary: This article proposes a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities in order to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. Experimental results show that the proposed MMNet outperforms state-of-the-art approaches on five benchmark datasets, effectively capturing mutually complementary features in different RGB-D video modalities and providing more discriminative features for human action recognition.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)