Journal
NEUROCOMPUTING
Volume 400, Issue -, Pages 238-254Publisher
ELSEVIER
DOI: 10.1016/j.neucom.2020.03.038
Keywords
Dynamic gesture recognition; Convolutional neural network; Temporal information representation
Categories
Funding
- CAPES (Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior)
Ask authors/readers for more resources
Due to technological advances, machines are increasingly present in people's daily lives. Thus, there has been more and more effort to develop interfaces that provide an intuitive way of interaction, such as dynamic gestures. Currently, the most common trend is to use multimodal data, as depth and skeleton information, to enable dynamic gesture recognition. However, it would be more interesting if only color information was used, since RGB cameras are usually available in almost every public place, and could be used for gesture recognition without the need of installing additional equipment. The main problem with such approach is the difficulty of representing spatio-temporal information using just color. With this in mind, we propose a technique capable of condensing a dynamic gesture, shown in a video, in just one RGB image. We call this technique star RGB. This image is then passed to a classifier formed by two Resnet CNNs, a soft-attention ensemble, and a fully connected layer, which indicates the class of the gesture present in the input video. Experiments were carried out using Montalbano, GRIT, and isoGD datasets. For Montalbano dataset, the proposed approach achieved an accuracy of 94.58%. Such result reaches the state-of-the-art when considering this dataset and only color information. For GRIT dataset, our proposal achieves more than 98% of accuracy, recall, precision, and F1-score, outperforming the authors' approach by more than 6%. Regarding the large scale isoGD dataset, the proposal achieved 52.18% of accuracy. However, taking into account the complexity of the dataset (eight different gestures categories) and the amount of classes (249), we consider that our approach is competitive with previous ones, since we employed only color information to recognize gestures instead of all the multimodal data available, usually used by other methods. (C) 2020 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available