4.6 Article

Dynamic gesture recognition by using CNNs and star RGB: A temporal information condensation

Journal

NEUROCOMPUTING
Volume 400, Issue -, Pages 238-254

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2020.03.038

Keywords

Dynamic gesture recognition; Convolutional neural network; Temporal information representation

Funding

  1. CAPES (Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior)

Ask authors/readers for more resources

Due to technological advances, machines are increasingly present in people's daily lives. Thus, there has been more and more effort to develop interfaces that provide an intuitive way of interaction, such as dynamic gestures. Currently, the most common trend is to use multimodal data, as depth and skeleton information, to enable dynamic gesture recognition. However, it would be more interesting if only color information was used, since RGB cameras are usually available in almost every public place, and could be used for gesture recognition without the need of installing additional equipment. The main problem with such approach is the difficulty of representing spatio-temporal information using just color. With this in mind, we propose a technique capable of condensing a dynamic gesture, shown in a video, in just one RGB image. We call this technique star RGB. This image is then passed to a classifier formed by two Resnet CNNs, a soft-attention ensemble, and a fully connected layer, which indicates the class of the gesture present in the input video. Experiments were carried out using Montalbano, GRIT, and isoGD datasets. For Montalbano dataset, the proposed approach achieved an accuracy of 94.58%. Such result reaches the state-of-the-art when considering this dataset and only color information. For GRIT dataset, our proposal achieves more than 98% of accuracy, recall, precision, and F1-score, outperforming the authors' approach by more than 6%. Regarding the large scale isoGD dataset, the proposal achieved 52.18% of accuracy. However, taking into account the complexity of the dataset (eight different gestures categories) and the amount of classes (249), we consider that our approach is competitive with previous ones, since we employed only color information to recognize gestures instead of all the multimodal data available, usually used by other methods. (C) 2020 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available