4.4 Article

Gesture recognition algorithm based on multi-scale feature fusion in RGB-D images

期刊

IET IMAGE PROCESSING
卷 17, 期 4, 页码 1280-1290

出版社

WILEY
DOI: 10.1049/ipr2.12712

关键词

image processing; neural nets

向作者/读者索取更多资源

With the advancement of sensor technology and artificial intelligence, video gesture recognition technology using big data has made human-computer interaction more natural and flexible, providing enhanced interactive experiences in teaching, on-board control, electronic gaming, and more. A robust recognition algorithm based on multi-level feature fusion of a two-stream convolutional neural network is proposed to handle challenges like lighting changes, background clutter, rapid movement, and partial occlusion. Experimental results demonstrate that the proposed model can accurately track and recognize gestures, outperforming the single-channel model with improved detection accuracy and mean average precision (mAP). Furthermore, it achieves high recognition rates under occlusion and varying light intensities, showing satisfactory performance compared to other methods in various datasets.
With the rapid development of sensor technology and artificial intelligence, the video gesture recognition technology under the background of big data makes human-computer interaction more natural and flexible, bringing richer interactive experience to teaching, on-board control, electronic games, etc. In order to perform robust recognition under the conditions of illumination change, background clutter, rapid movement, partial occlusion, an algorithm based on multi-level feature fusion of two-stream convolutional neural network is proposed, which includes three main steps. Firstly, the Kinect sensor obtains RGB-D images to establish a gesture database. At the same time, data enhancement is performed on training and test sets. Then, a model of multi-level feature fusion of two-stream convolutional neural network is established and trained. Experiments result show that the proposed network model can robustly track and recognize gestures, and compared with the single-channel model, the average detection accuracy is improved by 1.08%, and mean average precision (mAP) is improved by 3.56%. The average recognition rate of gestures under occlusion and different light intensity was 93.98%. Finally, in the ASL dataset, LaRED dataset, and 1-miohand dataset, recognition accuracy shows satisfactory performances compared to the other method.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据