4.7 Article

Mutual Attention Inception Network for Remote Sensing Visual Question Answering

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TGRS.2021.3079918

Keywords

Task analysis; Remote sensing; Visualization; Knowledge discovery; Semantics; Object detection; Feature extraction; Attention mechanism; feature fusion; remote sensing visual question answering (RSVQA); semantic understanding

Funding

  1. National Science Fund for Distinguished Young Scholars [61925112]
  2. National Natural Science Foundation of China [61806193, 61772510]
  3. Innovation Capability Support Program of Shaanxi [2020KJXX-091, 2020TD-015]
  4. Key Research and Development Program of Shaanxi [2020ZDLGY04-03]

Ask authors/readers for more resources

This study introduces a method for remote sensing visual question answering (VQA) that considers the fusion of image features and question features, introducing convolutional features and word vectors, as well as attention mechanism and bilinear technique. Experimental results demonstrate that the proposed method can capture the alignments between images and questions.
Remote sensing images (RSIs) containing various ground objects have been applied in many fields. To make semantic understanding of RSIs objective and interactive, the task remote sensing visual question answering (VQA) has appeared. Given an RSI, the goal of remote sensing VQA is to make an intelligent agent answer a question about the remote sensing scene. Existing remote sensing VQA methods utilized a nonspatial fusion strategy to fuse the image features and question features, which ignores the spatial information of images and word-level information of questions. A novel method is proposed to complete the task considering these two aspects. First, convolutional features of the image are included to represent spatial information, and the word vectors of questions are adopted to present semantic word information. Second, attention mechanism and bilinear technique are introduced to enhance the feature considering the alignments between spatial positions and words. Finally, a fully connected layer with softmax is utilized to output an answer from the perspective of the multiclass classification task. To benchmark this task, a RSIVQA dataset is introduced in this article. For each of more than 37,000 RSIs, the proposed dataset contains at least one or more questions, plus corresponding answers. Experimental results demonstrate that the proposed method can capture the alignments between images and questions. The code and dataset are available at https://github.com/spectralpublic/RSIVQA.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Geochemistry & Geophysics

Semisupervised Spectral Degradation Constrained Network for Spectral Super-Resolution

Wenjing Chen, Xiangtao Zheng, Xiaoqiang Lu

Summary: This letter introduces a semisupervised spectral degradation constrained network (SSDCN) to enhance the spectral resolution of MSI, using an autoencoder-like network for estimating and reconstructing HSI. A semisupervised training method is proposed to optimize SSDCN with both MSI/HSI pairs and MSIs without ground-truth HSIs. The effectiveness of SSDCN is demonstrated using simulated and real databases.

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS (2022)

Article Geochemistry & Geophysics

Pairwise Comparison Network for Remote-Sensing Scene Classification

Yue Zhang, Xiangtao Zheng, Xiaoqiang Lu

Summary: This letter introduces a pairwise comparison network (PCNet) for remote-sensing scene classification, which first selects similar image pairs and then represents them with pairwise representations to capture subtle differences and improve performance.

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS (2022)

Article Computer Science, Information Systems

Identity Feature Disentanglement for Visible-Infrared Person Re-Identification

Xiumei Chen, Xiangtao Zheng, Xiaoqiang Lu

Summary: This article proposes an identity feature disentanglement method for the VI-ReID task. It first processes images of different modalities to extract shared features and then disentangles the extracted feature of each image into a latent identity variable and an identity-irrelevant variable. Extensive experiments demonstrate the efficacy and superiority of the proposed method.

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS (2023)

Article Computer Science, Artificial Intelligence

Hierarchically stacked graph convolution for emotion recognition in conversation

Binqiang Wang, Gang Dong, Yaqian Zhao, Rengang Li, Qichun Cao, Kekun Hu, Dongdong Jiang

Summary: Accurate emotion recognition enables robots to understand human affection intentions and deliver emotional responses. This paper proposes a novel Hierarchically Stacked Graph Convolution Framework (HSGCF) that extracts emotional discriminative features using a hierarchical structure. Experimental results show a 4.12% improvement in accuracy and a 4.80% improvement in F1 score compared to the baseline method.

KNOWLEDGE-BASED SYSTEMS (2023)

Article Geochemistry & Geophysics

A Spectrum-Aware Transformer Network for Change Detection in Hyperspectral Imagery

Wuxia Zhang, Liangxu Su, Yuhang Zhang, Xiaoqiang Lu

Summary: This paper proposes an end-to-end change detection network called Spectrum-aware Transformer Network (SATNet) to improve change detection performance in hyperspectral imagery (HSI). SATNet consists of SETrans feature extraction module, transformer-based correlation representation module, and detection module. Experimental results demonstrate that SATNet outperforms existing change detection methods.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2023)

Article Geochemistry & Geophysics

Dual Teacher: A Semisupervised Cotraining Framework for Cross-Domain Ship Detection

Xiangtao Zheng, Haowen Cui, Chujie Xu, Xiaoqiang Lu

Summary: This article proposes a dual-teacher framework to address the mutual interference between optical and SAR supervision in cross-domain ship detection. The framework decomposes the supervision tasks into two subtasks and learns them interactively in two individual teacher-student models. The effectiveness of the framework is demonstrated through experiments.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2023)

Article Engineering, Electrical & Electronic

Remote Sensing Image Retrieval by Deep Attention Hashing With Distance-Adaptive Ranking

Yichao Zhang, Xiangtao Zheng, Xiaoqiang Lu

Summary: This article proposes a novel deep attention hashing (DAH) method for remote sensing image retrieval. The method utilizes a channel-spatial joint attention mechanism for feature extraction and a balanced pairwise weighted loss function for hash code training. In the retrieval phase, a distance-adaptive ranking strategy with category-weighted Hamming distance is employed. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method.

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING (2023)

Article Geochemistry & Geophysics

Fine Aligned Discriminative Hashing for Remote Sensing Image-Audio Retrieval

Yaxiong Chen, Jinghao Huang, Shengwu Xiong, Xiaoqiang Lu

Summary: For cross-modal remote sensing image-audio retrieval task, this article proposes a novel fine aligned discriminative hashing (FADH) approach, which can capture discriminative information of RS images and learn the corresponding detailed information between RS images and audios simultaneously. This approach includes a discriminative information learning module and a fine alignment module to improve retrieval performance. The designed objective function maintains the similarity of hash codes, preserves semantic information, and eliminates cross-modal differences.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2023)

Article Geochemistry & Geophysics

Self-Supervision Interactive Alignment for Remote Sensing Image-Audio Retrieval

Jinghao Huang, Yaxiong Chen, Shengwu Xiong, Xiaoqiang Lu

Summary: This article presents a novel unsupervised cross-modal RSIA retrieval approach called self-supervision interactive alignment (SSIA), which utilizes unlabeled samples to learn salient information, cross-modal alignment, and the similarity between RSIs and audios. The SSIA outperforms other compared approaches in RSIA retrieval performance, as validated by extensive experiments on four widely used RSIA datasets.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2023)

Article Geochemistry & Geophysics

MATNet: A Combining Multi-Attention and Transformer Network for Hyperspectral Image Classification

Bo Zhang, Yaxiong Chen, Yi Rong, Shengwu Xiong, Xiaoqiang Lu

Summary: Based on previous work, we propose an HSI classification network called MATNet, which combines multi-attention and transformer. The network uses spatial attention and channel attention to focus on more important information parts, then utilizes a tokenizer module for semantic-level representation and a transformer encoder module for deep semantic feature extraction. We also design a loss function called Lpoly to accommodate different datasets and tasks. Experimental results demonstrate that MATNet performs well in extracting spatial-spectral features of HSIs and understanding semantic degrees.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2023)

Article Engineering, Electrical & Electronic

Hyperspectral and LiDAR Representation With Spectral-Spatial Graph Network

Xingqian Du, Xiangtao Zheng, Xiaoqiang Lu, Xin Wang

Summary: The study proposes a spectral-spatial graph network to integrate HSI and LiDAR data, capturing local and global spectral-spatial associations. The experiments demonstrate that the network achieves comparable performance to state-of-the-art methods.

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING (2023)

Article Geochemistry & Geophysics

Difference-Enhancement Triplet Network for Change Detection in Multispectral Images

Wuxia Zhang, Yuhang Zhang, Liangxu Su, Chao Mei, Xiaoqiang Lu

Summary: This study proposes an end-to-end deep neural network called DETNet for multispectral change detection. The network consists of a triplet feature extraction module and a difference feature learning module, which aim to detect subtle changes by mining the difference representations of learned features. Experimental results demonstrate the superiority of DETNet on four datasets.

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS (2023)

Article Geochemistry & Geophysics

Deep Feature Reconstruction Learning for Open-Set Classification of Remote-Sensing Imagery

Hao Sun, Qianqian Li, Jie Yu, Dongbo Zhou, Wenjing Chen, Xiangtao Zheng, Xiaoqiang Lu

Summary: In this letter, a deep feature reconstruction learning (DFRL) framework is proposed for the open-set classification of remote-sensing scene images (RSSIs). The proposed method combines discriminative feature learning and feature reconstruction, and effectively distinguishes known and unknown classes by feature-level reconstruction and sparse regularization.

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS (2023)

Article Geochemistry & Geophysics

Cross-Attention Spectral-Spatial Network for Hyperspectral Image Classification

Kai Yang, Hao Sun, Chunbo Zou, Xiaoqiang Lu

Summary: This paper introduces a cross-attention spectral-spatial network (CASSN) to address the rotation issue in hyperspectral image classification, by extracting spectral and spatial features to determine pixel categories.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING (2022)

No Data Available