☆ 4.6 Article

Dynamic gesture recognition by using CNNs and star RGB: A temporal information condensation

NEUROCOMPUTING (2020)

Journal

NEUROCOMPUTING

Volume 400, Issue -, Pages 238-254

Publisher

ELSEVIER

DOI: 10.1016/j.neucom.2020.03.038

Keywords

Dynamic gesture recognition; Convolutional neural network; Temporal information representation

Funding

CAPES (Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Due to technological advances, machines are increasingly present in people's daily lives. Thus, there has been more and more effort to develop interfaces that provide an intuitive way of interaction, such as dynamic gestures. Currently, the most common trend is to use multimodal data, as depth and skeleton information, to enable dynamic gesture recognition. However, it would be more interesting if only color information was used, since RGB cameras are usually available in almost every public place, and could be used for gesture recognition without the need of installing additional equipment. The main problem with such approach is the difficulty of representing spatio-temporal information using just color. With this in mind, we propose a technique capable of condensing a dynamic gesture, shown in a video, in just one RGB image. We call this technique star RGB. This image is then passed to a classifier formed by two Resnet CNNs, a soft-attention ensemble, and a fully connected layer, which indicates the class of the gesture present in the input video. Experiments were carried out using Montalbano, GRIT, and isoGD datasets. For Montalbano dataset, the proposed approach achieved an accuracy of 94.58%. Such result reaches the state-of-the-art when considering this dataset and only color information. For GRIT dataset, our proposal achieves more than 98% of accuracy, recall, precision, and F1-score, outperforming the authors' approach by more than 6%. Regarding the large scale isoGD dataset, the proposal achieved 52.18% of accuracy. However, taking into account the complexity of the dataset (eight different gestures categories) and the amount of classes (249), we consider that our approach is competitive with previous ones, since we employed only color information to recognize gestures instead of all the multimodal data available, usually used by other methods. (C) 2020 Elsevier B.V. All rights reserved.

Dynamic gesture recognition by using CNNs and star RGB: A temporal information condensation

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Dynamic gesture recognition by using CNNs and star RGB: A temporal information condensation

Journal

NEUROCOMPUTING

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper