☆ 4.7 Article

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

ACM TRANSACTIONS ON GRAPHICS (2021)

Journal

ACM TRANSACTIONS ON GRAPHICS

Volume 40, Issue 1, Pages -

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3407659

Keywords

Pose estimation; motion capturing; motion analysis

Funding

National Key R&D Program of China [2018YFB1403900, 2019YFF0302902]
Israel Science Foundation [2366/16]
European Union [739578]
Government of the Republic of Cyprus

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

MotioNet is a deep neural network that reconstructs 3D human skeleton motion from monocular video. The network decomposes 2D joint position sequences into bone length-encoded skeleton and 3D joint rotation sequences, outputting 3D positions through an integrated FK layer for comparison with ground truth.

We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from a monocular video. While previous methods rely on either rigging or inverse kinematics (IK) to associate a consistent skeleton with temporally coherent joint rotations, our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used motion representation. At the crux of our approach lies a deep neural network with embedded kinematic priors, which decomposes sequences of 2D joint positions into two separate attributes: a single, symmetric skeleton encoded by bone lengths, and a sequence of 3D joint rotations associated with global root positions and foot contact labels. These attributes are fed into an integrated forward kinematics (FK) layer that outputs 3D positions, which are compared to a ground truth. In addition, an adversarial loss is applied to the velocities of the recovered rotations to ensure that they lie on the manifold of natural joint rotations. The key advantage of our approach is that it learns to infer natural joint rotations directly from the training data rather than assuming an underlying model, or inferring them from joint positions using a data-agnostic IK solver. We show that enforcing a single consistent skeleton along with temporally coherent joint rotations constrains the solution space, leading to a more robust handling of self-occlusions and depth ambiguities.

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

Journal

ACM TRANSACTIONS ON GRAPHICS

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

MotioNet: 3D Human Motion Reconstruction from Monocular Video with Skeleton Consistency

Journal

ACM TRANSACTIONS ON GRAPHICS

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper