☆ 4.7 Article

Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation

IEEE TRANSACTIONS ON MULTIMEDIA (2022)

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Volume 24, Issue -, Pages 768-779

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2021.3059098

Keywords

Assistive technology; Gesture recognition; Hidden Markov models; Shape; Optimization; Visualization; Tools; Multi-cue; pose estimation; segmented attention; sign language recognition; sign language translation

Funding

National Natural Science Foundation of China [U20A20183, 61836006, 61836011]
Youth Innovation Promotion Association CAS [2018497]
GPU cluster built by MCC Laboratory of Information Science and Technology Institution, USTC

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The research proposes a spatial-temporal multi-cue (STMC) network for video-based sign language understanding, with a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. A joint optimization strategy and segmented attention mechanism are designed to make the best of multi-cue sources for sign language recognition and translation, achieving new state-of-the-art performance on three sign language benchmarks.

Despite the recent success of deep learning in video-related tasks, deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars in sign videos behind the collaboration of different visual cues (i.e., hand shape, facial expression and body posture). To this end, we approach video-based sign language understanding with multi-cue learning and propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module learns to spatial representation of different cues with a self-contained pose estimation branch. The TMC module models temporal corrections from intra-cue and inter-cue perspectives to explore the collaboration of multiple cues. A joint optimization strategy and a segmented attention mechanism are designed to make the best of multi-cue sources for SL recognition and translation. To validate the effectiveness, we perform experiments on three large-scale sign language benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.

Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation

Journal

IEEE TRANSACTIONS ON MULTIMEDIA

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper