4.7 Article

A Quadruple Diffusion Convolutional Recurrent Network for Human Motion Prediction

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSVT.2020.3038145

Keywords

Dynamics; Predictive models; Adaptation models; Hidden Markov models; Computational modeling; Bidirectional control; Training; Human motion prediction; body joint dynamics; diffusion convolutions; recurrent neural network; bi-directional predictor

Funding

  1. City University of Hong Kong [9220077, 9678139]
  2. Royal Society [IES\R2\181024, IES\R1\191147]

Ask authors/readers for more resources

The study introduces a novel diffusion convolutional recurrent predictor for spatial and temporal movement forecasting by utilizing multi-step random walks and adversarial training to effectively model the complex spatial and temporal relationships in human skeletal structure, achieving superior performance in action prediction.
Recurrent neural network (RNN) has become popular for human motion prediction thanks to its ability to capture temporal dependencies. However, it has limited capacity in modeling the complex spatial relationship in the human skeletal structure. In this work, we present a novel diffusion convolutional recurrent predictor for spatial and temporal movement forecasting, with multi-step random walks traversing bidirectionally along an adaptive graph to model interdependency among body joints. In the temporal domain, existing methods rely on a single forward predictor with the produced motion deflecting to the drift route, which leads to error accumulations over time. We propose to supplement the forward predictor with a forward discriminator to alleviate such motion drift in the long term under adversarial training. The solution is further enhanced by a backward predictor and a backward discriminator to effectively reduce the error, such that the system can also look into the past to improve the prediction at early frames. The two-way spatial diffusion convolutions and two-way temporal predictors together form a quadruple network. Furthermore, we train our framework by modeling the velocity from observed motion dynamics instead of static poses to predict future movements that effectively reduces the discontinuity problem at early prediction. Our method outperforms the state of the arts on both 3D and 2D datasets, including the Human3.6M, CMU Motion Capture and Penn Action datasets. The results also show that our method correctly predicts both high-dynamic and low-dynamic moving trends with less motion drift.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Engineering, Electrical & Electronic

AlignBodyNet: Deep Learning-Based Alignment of Non-Overlapping Partial Body Point Clouds From a Single Depth Camera

Pengpeng Hu, Edmond S. L. Ho, Adrian Munteanu

Summary: This article proposes a novel deep learning framework for generating omnidirectional 3-D point clouds of human bodies by registering front- and back-facing partial scans. The method does not require calibration-assisting devices or assumptions on initial alignment or correspondences. The approach builds virtual correspondences for the partial scans and predicts the rigid transformation between them through deep neural networks. Experiments show that the proposed method achieves state-of-the-art performance in both objective and subjective terms.

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT (2023)

Article Computer Science, Artificial Intelligence

Focalized contrastive view-invariant learning for self-supervised skeleton-based action recognition

Qianhui Men, Edmond S. L. Ho, Hubert P. H. Shum, Howard Leung

Summary: In this work, a self-supervised framework called FoCoViL is proposed, which associates actions with common view-invariant properties and simultaneously separates dissimilar viewpoints by maximizing mutual information between multi-view sample pairs. An adaptive focalization method based on pairwise similarity is further proposed to enhance contrastive learning for a clearer cluster boundary. FoCoViL performs well on both unsupervised and supervised classifiers, and the proposed contrastive-based focalization generates a more discriminative latent representation.

NEUROCOMPUTING (2023)

Article Computer Science, Artificial Intelligence

Interaction-Aware Decision-Making for Automated Vehicles Using Social Value Orientation

Luca Crosato, Hubert P. H. Shum, Edmond S. L. Ho, Chongfeng Wei

Summary: This paper proposes a framework based on Social Value Orientation and Deep Reinforcement Learning (DRL) for decision-making in the presence of pedestrians. The framework trains decision-making policies with different driving styles using state-of-the-art DRL algorithms in a simulated environment. It also introduces a computationally-efficient pedestrian model suitable for DRL training.

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES (2023)

Article Computer Science, Software Engineering

INCLG: Inpainting for non-cleft lip generation with a multi-task image processing network

Shuang Chen, Amir Atapour-Abarghouei, Edmond S. L. Ho, Hubert P. H. Shum

Summary: We introduce a software that predicts non-cleft facial images for patients with cleft lip, facilitating the understanding and discussion of cleft lip surgeries. To protect privacy, we design a software framework using image inpainting, which doesn't require cleft lip images for training, mitigating the risk of leakage. We implement a novel multi-task architecture that predicts both non-cleft facial images and facial landmarks, resulting in improved performance as evaluated by surgeons. The software is implemented with PyTorch, supporting consumer-level color images and offering fast prediction speed for effective deployment.

SOFTWARE IMPACTS (2023)

Proceedings Paper Computer Science, Information Systems

Predicting Sleeping Quality Using Convolutional Neural Networks

Vidya Rohini Konanur Sathish, Wai Lok Woo, Edmond S. L. Ho

Summary: Identifying sleep stages and patterns is crucial for diagnosing and treating sleep disorders. This paper proposes a CNN architecture to improve the classification performance by benchmarking it against traditional machine learning methods on publicly available sleep datasets. Accuracy, sensitivity, specificity, precision, recall, and F-score are reported as baseline for future research in this direction.

ADVANCES IN CYBERSECURITY, CYBERCRIMES, AND SMART EMERGING TECHNOLOGIES (2023)

Proceedings Paper Computer Science, Information Systems

Improving Deep Learning Model Robustness Against Adversarial Attack by Increasing the Network Capacity

Marco Marchetti, Edmond S. L. Ho

Summary: This paper examines the security issues in Deep Learning and conducts experiments to explore ways to enhance the resilience of DL models against adversarial attacks. The results demonstrate improvements and offer new insights that can guide researchers and practitioners in developing more robust DL algorithms.

ADVANCES IN CYBERSECURITY, CYBERCRIMES, AND SMART EMERGING TECHNOLOGIES (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

Li Li, Hubert P. H. Shum, Toby P. Breckon

Summary: This study proposes a semi-supervised semantic segmentation method that achieves superior accuracy with fewer annotations by utilizing a smaller architecture and a novel convolution module. The method also reduces computational costs and improves performance through new data sub-sampling and soft pseudo-label techniques.

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Unifying Human Motion Synthesis and Style Transfer with Denoising Diffusion Probabilistic Models

Ziyi Chang, Edmund J. C. Findlay, Haozheng Zhang, Hubert P. H. Shum

Summary: This article proposes a denoising diffusion probabilistic model solution for generating styled motion of digital humans. By representing both inter-class motion content and intra-class style behavior in the same latent, an integrated, end-to-end trained pipeline is achieved. A multi-task architecture of diffusion model and adversarial and physical regulations are designed, resulting in superior performance.

PROCEEDINGS OF THE 18TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023 (2023)

Article Computer Science, Interdisciplinary Applications

Image editing-based data augmentation for illumination-insensitive background subtraction

Dimitrios Sakkos, Edmond S. L. Ho, Hubert P. H. Shum, Garry Elvin

Summary: The researchers addressed the challenge of illumination changes in background subtraction using data augmentation and proposed a post-processing method to improve the accuracy of segmentation. The experiments demonstrated the significant contribution of this method in handling illumination changes.

JOURNAL OF ENTERPRISE INFORMATION MANAGEMENT (2023)

Proceedings Paper Computer Science, Interdisciplinary Applications

Pose-Based Tremor Classification for Parkinson's Disease Diagnosis from Video

Haozheng Zhang, Edmond S. L. Ho, Xiatian Zhang, Hubert P. H. Shum

Summary: Parkinson's disease is a progressive neurodegenerative disorder with challenging diagnosis. We propose a low-cost Parkinson's tremor classification system using video recording of human movements, which incorporates an attention module to extract relevant information and filter noise. Experimental results show superior performance of our system.

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV (2022)

No Data Available