期刊
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
卷 117, 期 -, 页码 -出版社
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.engappai.2022.105510
关键词
Scene understanding; RGB-T; Cross-modal integration; Hybrid feature-cascaded aggregation module; Lightweight network
Recent achievements in scene understanding have been limited in their practical deployment, especially on mobile devices, due to high computational costs and memory consumption. In this study, a novel network called GCGLNet is introduced, which has fewer parameters and higher speed while ensuring accuracy. The network incorporates secondary cross-modal integration and a hybrid feature-cascaded aggregation module to better exploit the correlations between RGB and thermal cues and consider the influence of global features on low-and high-level features.
Recent achievements in scene understanding have benefited considerably from the rapid development of convolutional neural networks. However, improvements of scene understanding methods have been restricted in terms of practical deployment, especially in mobile devices, owing to their high computational costs and memory consumption. Existing networks can integrate RGB and thermal (RGB-T) cues for sample fusion, resulting in insufficient exploitation of the complicated correlations between the two image modalities. Moreover, some of these methods do not consider the influence of global features on the interactions between low-and high-level features. Hence, in this study, we introduce a novel network named the global contextually guided lightweight network (GCGLNet), which has fewer parameters and higher speed, ensuring accuracy. Specifically, secondary cross-modal integration is introduced to remove redundant information while fusing and propagating effective modal information. A hybrid feature-cascaded aggregation module is also introduced to emphasize the global context along with complementation and calibration between the high-and low-level features. Extensive experiments were conducted on two benchmark RGB-T datasets to demonstrate that the proposed GCGLNet yields an accuracy comparable with those of state-of-the-art approaches when operated at 51.89 FPS for 480 x 640 pixel inputs with only 7.87 M parameters. Thus, GCGLNet is expected to open new avenues for research on urban scene understanding via RGB-T sensors.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据