☆ 4.4 Article

Gesture recognition algorithm based on multi-scale feature fusion in RGB-D images

IET IMAGE PROCESSING (2023)

期刊

IET IMAGE PROCESSING

卷 17, 期 4, 页码 1280-1290

出版社

WILEY

DOI: 10.1049/ipr2.12712

关键词

image processing; neural nets

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Imaging Science & Photographic Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

With the advancement of sensor technology and artificial intelligence, video gesture recognition technology using big data has made human-computer interaction more natural and flexible, providing enhanced interactive experiences in teaching, on-board control, electronic gaming, and more. A robust recognition algorithm based on multi-level feature fusion of a two-stream convolutional neural network is proposed to handle challenges like lighting changes, background clutter, rapid movement, and partial occlusion. Experimental results demonstrate that the proposed model can accurately track and recognize gestures, outperforming the single-channel model with improved detection accuracy and mean average precision (mAP). Furthermore, it achieves high recognition rates under occlusion and varying light intensities, showing satisfactory performance compared to other methods in various datasets.

With the rapid development of sensor technology and artificial intelligence, the video gesture recognition technology under the background of big data makes human-computer interaction more natural and flexible, bringing richer interactive experience to teaching, on-board control, electronic games, etc. In order to perform robust recognition under the conditions of illumination change, background clutter, rapid movement, partial occlusion, an algorithm based on multi-level feature fusion of two-stream convolutional neural network is proposed, which includes three main steps. Firstly, the Kinect sensor obtains RGB-D images to establish a gesture database. At the same time, data enhancement is performed on training and test sets. Then, a model of multi-level feature fusion of two-stream convolutional neural network is established and trained. Experiments result show that the proposed network model can robustly track and recognize gestures, and compared with the single-channel model, the average detection accuracy is improved by 1.08%, and mean average precision (mAP) is improved by 3.56%. The average recognition rate of gestures under occlusion and different light intensity was 93.98%. Finally, in the ASL dataset, LaRED dataset, and 1-miohand dataset, recognition accuracy shows satisfactory performances compared to the other method.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.4

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

An image inpainting method based on generative adversarial networks inversion and autoencoder

Yechen Wang, Bin Song, Zhiyong Zhang

Summary: This paper proposes an image inpainting method based on GAN inversion and autoencoder. The method learns the mapping from noise to low-dimensional feature maps using a generator in an autoencoder-based GAN, and then converts the feature maps into high-resolution images. Experimental results show that the proposed method is more suitable for high-resolution image inpainting and performs better in inpainting large-range damaged images.