4.7 Article

Global contextually guided lightweight network for RGB-thermal urban scene understanding

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.engappai.2022.105510

关键词

Scene understanding; RGB-T; Cross-modal integration; Hybrid feature-cascaded aggregation module; Lightweight network

向作者/读者索取更多资源

Recent achievements in scene understanding have been limited in their practical deployment, especially on mobile devices, due to high computational costs and memory consumption. In this study, a novel network called GCGLNet is introduced, which has fewer parameters and higher speed while ensuring accuracy. The network incorporates secondary cross-modal integration and a hybrid feature-cascaded aggregation module to better exploit the correlations between RGB and thermal cues and consider the influence of global features on low-and high-level features.
Recent achievements in scene understanding have benefited considerably from the rapid development of convolutional neural networks. However, improvements of scene understanding methods have been restricted in terms of practical deployment, especially in mobile devices, owing to their high computational costs and memory consumption. Existing networks can integrate RGB and thermal (RGB-T) cues for sample fusion, resulting in insufficient exploitation of the complicated correlations between the two image modalities. Moreover, some of these methods do not consider the influence of global features on the interactions between low-and high-level features. Hence, in this study, we introduce a novel network named the global contextually guided lightweight network (GCGLNet), which has fewer parameters and higher speed, ensuring accuracy. Specifically, secondary cross-modal integration is introduced to remove redundant information while fusing and propagating effective modal information. A hybrid feature-cascaded aggregation module is also introduced to emphasize the global context along with complementation and calibration between the high-and low-level features. Extensive experiments were conducted on two benchmark RGB-T datasets to demonstrate that the proposed GCGLNet yields an accuracy comparable with those of state-of-the-art approaches when operated at 51.89 FPS for 480 x 640 pixel inputs with only 7.87 M parameters. Thus, GCGLNet is expected to open new avenues for research on urban scene understanding via RGB-T sensors.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据