☆ 4.7 Article

RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video

ACM TRANSACTIONS ON GRAPHICS (2020)

期刊

ACM TRANSACTIONS ON GRAPHICS

卷 39, 期 6, 页码 -

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3414685.3417852

关键词

hand tracking; hand pose estimation; hand reconstruction; two hands; monocular RGB; RGB video; computer vision

类别

Computer Science, Software Engineering

资金

ERC Consolidator Grant 4DRepLy [770784]
ERC Consolidator Grant TouchDesign [772738]
Spanish Ministry of Science [RTI2018-098694-B-I00 VizLearning]
European Research Council (ERC) [772738] Funding Source: European Research Council (ERC)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Tracking and reconstructing the 3D pose and geometry of two hands in interaction is a challenging problem that has a high relevance for several human-computer interaction applications, including AR/VR, robotics, or sign language recognition. Existing works are either limited to simpler tracking settings (e.g., considering only a single hand or two spatially separated hands), or rely on less ubiquitous sensors, such as depth cameras. In contrast, in this work we present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera that explicitly considers close interactions. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN that regresses multiple complementary pieces of information, including segmentation, dense matchings to a 3D hand model, and 2D keypoint positions, together with newly proposed infra-hand relative depth and inter-hand distance maps. These predictions are subsequently used in a generative model fitting framework in order to estimate pose and shape parameters of a 3D hand model for both hands. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline through an extensive ablation study. Moreover, we demonstrate that our approach offers previously unseen two-hand tracking performance from RGB, and quantitatively and qualitatively outperforms existing RGB-based methods that were not explicitly designed for two-hand interactions. Moreover, our method even performs on-par with depth-based real-time methods.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.7

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

3D Hand Pose Estimation From Monocular RGB With Feature Interaction Module

Shaoxiang Guo, Eric Rigall, Yakun Ju, Junyu Dong

Summary: This article discusses the challenges of estimating 3D hand pose from a monocular RGB image and proposes a simple and efficient deep neural network to improve this task. By designing a feature chat block, the model is able to better handle the relationship between joint and skeleton features, resulting in improved accuracy and faster inference speed.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)