Article
Geochemistry & Geophysics
Zixiao Zhang, Licheng Jiao, Lingling Li, Xu Liu, Puhua Chen, Fang Liu, Yuxuan Li, Zhicheng Guo
Summary: In this article, a novel method called spatial hierarchical reasoning network (SHRNet) is proposed to address the limitations of current methods in remote sensing visual question answering (RSVQA). The method enhances the visual-spatial reasoning capability and considers geospatial objects with large-scale differences and positional sensitive properties. Modeling and reasoning the relationships between entities are also explored for accurate answer predictions.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2023)
Article
Geochemistry & Geophysics
Zhenghang Yuan, Lichao Mou, Zhitong Xiong, Xiao Xiang Zhu
Summary: The detection of changes on the Earth's surface is crucial for urban planning and sustainability. However, current change detection techniques are only accessible to experts. To address this, the study introduces a new task called change detection-based visual question answering (CDVQA) on multitemporal aerial images, enabling users to obtain change-based information easily. The study presents a CDVQA dataset and a baseline framework along with different strategies for improving the performance of the CDVQA task. The results offer valuable insights for future CDVQA research.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2022)
Article
Computer Science, Artificial Intelligence
Pengpeng Zeng, Haonan Zhang, Lianli Gao, Jingkuan Song, Heng Tao Shen
Summary: This paper addresses the challenges of utilizing prior knowledge and structured visual information in Video Question Answering (VideoQA). The proposed Prior Knowledge and Object-sensitive Learning (PKOL) approach effectively integrates prior knowledge and learns object-sensitive representations to enhance the VideoQA task. The experiments demonstrate consistent improvements and state-of-the-art performance on competitive benchmarks.
IEEE TRANSACTIONS ON IMAGE PROCESSING
(2022)
Article
Computer Science, Artificial Intelligence
Haonan Luo, Guosheng Lin, Yazhou Yao, Fayao Liu, Zichuan Liu, Zhenmin Tang
Summary: Embodied Question Answering (EQA) is a newly defined research area where an agent uses real-world exploration to answer user questions. Existing methods lack semantic information, stability to ambiguity, and 3D spatial information, leading to poor performance in answering and navigation accuracy. To address these issues, this study proposes a depth and segmentation based visual attention mechanism for EQA. The proposed method effectively improves the performance of the Visual Question Answering (VQA) module and navigation module, resulting in significant overall accuracy improvement on House3D and Matterport3D datasets.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Computer Science, Artificial Intelligence
Liyang Zhang, Shuaicheng Liu, Donghao Liu, Pengpeng Zeng, Xiangpeng Li, Jingkuan Song, Lianli Gao
Summary: The new framework KAN utilizes object-related knowledge and a knowledge graph to assist in the reasoning process of VQA, with an attention module that adaptively balances the importance of external knowledge against detected objects. Extensive experiments demonstrate that KAN achieves state-of-the-art performance on challenging VQA datasets and provides benefits to VQA baselines.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2021)
Article
Geochemistry & Geophysics
Yakoub Bazi, Mohamad Mahmoud Al Rahhal, Mohamed Lamine Mekhalfi, Mansour Abdulaziz Al Zuair, Farid Melgani
Summary: This article proposes a visual question answering (VQA) approach for remote sensing images based on transformer models. The approach utilizes a contrastive language image pretraining (CLIP) network to embed image patches and question words into visual and textual representations. Attention mechanisms are used to capture dependencies within and between these representations, and the final answer is generated by combining the predictions of two classifiers. Experimental results demonstrate that the proposed approach achieves better performance compared to the state-of-the-art methods with reduced training size.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2022)
Article
Geochemistry & Geophysics
Meimei Zhang, Fang Chen, Bin Li
Summary: This study proposes an end-to-end multistep question-driven VQA system for remote sensing. By using a multiple-step attention mechanism and a question-driven module, the proposed model demonstrates superior performance in RSVQA compared to other models, and it shows a robust ability in understanding complex questions and image content.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2023)
Article
Computer Science, Artificial Intelligence
Zhuo Zheng, Yanfei Zhong, Junjue Wang, Ailong Ma, Liangpei Zhang
Summary: In this paper, a foreground-aware relation network (FarSeg++) is proposed to address the issues of scale variation, large intra-class variance of background, and foreground-background imbalance in high spatial resolution remote sensing imagery. The network improves the discrimination of foreground features, achieves balanced optimization, and enhances objectness representation. Experimental results demonstrate that FarSeg++ outperforms state-of-the-art semantic segmentation methods and achieves a better trade-off between speed and accuracy.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
(2023)
Article
Geochemistry & Geophysics
Zhenghang Yuan, Lichao Mou, Qi Wang, Xiao Xiang Zhu
Summary: Visual question answering (VQA) for remote sensing scene has great potential, but it is still in its infancy. The RSVQA task needs to consider the lack of object annotations and the varying difficulty levels of questions. This article proposes a multi-level visual feature learning method and a self-paced curriculum learning-based VQA model to address these issues. Experimental results demonstrate promising performance of the proposed framework.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2022)
Article
Computer Science, Artificial Intelligence
Dalu Guo, Chang Xu, Dacheng Tao
Summary: This article revisits the bilinear attention networks (BANs) in the visual question answering task from a graph perspective. The classical BANs lack fully exploring the relationship between words for complex reasoning. In contrast, the proposed bilinear graph networks model the context of the joint embeddings of words and objects, enabling the realization of multistep reasoning.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
(2023)
Article
Geochemistry & Geophysics
Ke Zhang, Yulin Wu, Jingyu Wang, Yezi Wang, Qi Wang
Summary: This paper proposes a semantic context-aware network (SCANet) model for multiscale object detection, with the use of receptive field-enhancement module (RFEM) and semantic context fusion module (SCFM) to enhance performance. Experimental results show that SCANet achieves superior detection results on the DOTA-v1.5 dataset compared to state-of-the-art approaches.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
(2022)
Article
Computer Science, Information Systems
Aihua Mao, Zhi Yang, Ken Lin, Jun Xuan, Yong-Jin Liu
Summary: This paper introduces a novel positional attention guided Transformer-like architecture to address the challenge of utilizing positional information in visual question answering (VQA) tasks. Experimental results demonstrate that the proposed model outperforms state-of-the-art models and performs particularly well in handling object counting questions.
IEEE TRANSACTIONS ON MULTIMEDIA
(2023)
Article
Engineering, Electrical & Electronic
Jie Chen, Hao Wang, Ya Guo, Geng Sun, Yi Zhang, Min Deng
Summary: This article focuses on the current research status and challenges of semantic segmentation in remote sensing images, and proposes a novel DCNN-based semantic segmentation method. The method distinguishes and describes the details of geo-objects through the cascaded relation attention module and multiscale feature module.
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
(2021)
Article
Computer Science, Information Systems
Jun Hu, Shengsheng Qian, Quan Fang, Changsheng Xu
Summary: In this paper, a Social-aware Multi-modal Co-attention Convolutional Matching method (SMCACM) is proposed for community-based question answering systems to accurately match relevant answers for a given question. By extracting complementary information from questions and answers, utilizing visual content and social context, this method demonstrates superior performance in experimental results compared to other state-of-the-art algorithms.
IEEE TRANSACTIONS ON MULTIMEDIA
(2021)
Article
Geochemistry & Geophysics
Ke Zhang, Yulin Wu, Jingyu Wang, Qi Wang
Summary: Compared with general optical images, remote sensing images (RSIs) capture large areas from high altitudes with a bird's eye view, which provides abundant scene information but also presents challenges for object detection. To address the poor context utilization in RSIs, a hierarchical context embedding network (HCENet) is proposed, which constructs a semantic feature pyramid and utilizes scene-level context embedding module to improve object detection performance.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
(2022)
Article
Geochemistry & Geophysics
Wenjing Chen, Xiangtao Zheng, Xiaoqiang Lu
Summary: This letter introduces a semisupervised spectral degradation constrained network (SSDCN) to enhance the spectral resolution of MSI, using an autoencoder-like network for estimating and reconstructing HSI. A semisupervised training method is proposed to optimize SSDCN with both MSI/HSI pairs and MSIs without ground-truth HSIs. The effectiveness of SSDCN is demonstrated using simulated and real databases.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
(2022)
Article
Geochemistry & Geophysics
Yue Zhang, Xiangtao Zheng, Xiaoqiang Lu
Summary: This letter introduces a pairwise comparison network (PCNet) for remote-sensing scene classification, which first selects similar image pairs and then represents them with pairwise representations to capture subtle differences and improve performance.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
(2022)
Article
Computer Science, Information Systems
Xiumei Chen, Xiangtao Zheng, Xiaoqiang Lu
Summary: This article proposes an identity feature disentanglement method for the VI-ReID task. It first processes images of different modalities to extract shared features and then disentangles the extracted feature of each image into a latent identity variable and an identity-irrelevant variable. Extensive experiments demonstrate the efficacy and superiority of the proposed method.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
(2023)
Article
Computer Science, Artificial Intelligence
Binqiang Wang, Gang Dong, Yaqian Zhao, Rengang Li, Qichun Cao, Kekun Hu, Dongdong Jiang
Summary: Accurate emotion recognition enables robots to understand human affection intentions and deliver emotional responses. This paper proposes a novel Hierarchically Stacked Graph Convolution Framework (HSGCF) that extracts emotional discriminative features using a hierarchical structure. Experimental results show a 4.12% improvement in accuracy and a 4.80% improvement in F1 score compared to the baseline method.
KNOWLEDGE-BASED SYSTEMS
(2023)
Article
Geochemistry & Geophysics
Wuxia Zhang, Liangxu Su, Yuhang Zhang, Xiaoqiang Lu
Summary: This paper proposes an end-to-end change detection network called Spectrum-aware Transformer Network (SATNet) to improve change detection performance in hyperspectral imagery (HSI). SATNet consists of SETrans feature extraction module, transformer-based correlation representation module, and detection module. Experimental results demonstrate that SATNet outperforms existing change detection methods.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2023)
Article
Geochemistry & Geophysics
Xiangtao Zheng, Haowen Cui, Chujie Xu, Xiaoqiang Lu
Summary: This article proposes a dual-teacher framework to address the mutual interference between optical and SAR supervision in cross-domain ship detection. The framework decomposes the supervision tasks into two subtasks and learns them interactively in two individual teacher-student models. The effectiveness of the framework is demonstrated through experiments.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2023)
Article
Engineering, Electrical & Electronic
Yichao Zhang, Xiangtao Zheng, Xiaoqiang Lu
Summary: This article proposes a novel deep attention hashing (DAH) method for remote sensing image retrieval. The method utilizes a channel-spatial joint attention mechanism for feature extraction and a balanced pairwise weighted loss function for hash code training. In the retrieval phase, a distance-adaptive ranking strategy with category-weighted Hamming distance is employed. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method.
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
(2023)
Article
Geochemistry & Geophysics
Yaxiong Chen, Jinghao Huang, Shengwu Xiong, Xiaoqiang Lu
Summary: For cross-modal remote sensing image-audio retrieval task, this article proposes a novel fine aligned discriminative hashing (FADH) approach, which can capture discriminative information of RS images and learn the corresponding detailed information between RS images and audios simultaneously. This approach includes a discriminative information learning module and a fine alignment module to improve retrieval performance. The designed objective function maintains the similarity of hash codes, preserves semantic information, and eliminates cross-modal differences.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2023)
Article
Geochemistry & Geophysics
Jinghao Huang, Yaxiong Chen, Shengwu Xiong, Xiaoqiang Lu
Summary: This article presents a novel unsupervised cross-modal RSIA retrieval approach called self-supervision interactive alignment (SSIA), which utilizes unlabeled samples to learn salient information, cross-modal alignment, and the similarity between RSIs and audios. The SSIA outperforms other compared approaches in RSIA retrieval performance, as validated by extensive experiments on four widely used RSIA datasets.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2023)
Article
Geochemistry & Geophysics
Bo Zhang, Yaxiong Chen, Yi Rong, Shengwu Xiong, Xiaoqiang Lu
Summary: Based on previous work, we propose an HSI classification network called MATNet, which combines multi-attention and transformer. The network uses spatial attention and channel attention to focus on more important information parts, then utilizes a tokenizer module for semantic-level representation and a transformer encoder module for deep semantic feature extraction. We also design a loss function called Lpoly to accommodate different datasets and tasks. Experimental results demonstrate that MATNet performs well in extracting spatial-spectral features of HSIs and understanding semantic degrees.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2023)
Article
Engineering, Electrical & Electronic
Xingqian Du, Xiangtao Zheng, Xiaoqiang Lu, Xin Wang
Summary: The study proposes a spectral-spatial graph network to integrate HSI and LiDAR data, capturing local and global spectral-spatial associations. The experiments demonstrate that the network achieves comparable performance to state-of-the-art methods.
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
(2023)
Article
Geochemistry & Geophysics
Wuxia Zhang, Yuhang Zhang, Liangxu Su, Chao Mei, Xiaoqiang Lu
Summary: This study proposes an end-to-end deep neural network called DETNet for multispectral change detection. The network consists of a triplet feature extraction module and a difference feature learning module, which aim to detect subtle changes by mining the difference representations of learned features. Experimental results demonstrate the superiority of DETNet on four datasets.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
(2023)
Article
Geochemistry & Geophysics
Hao Sun, Qianqian Li, Jie Yu, Dongbo Zhou, Wenjing Chen, Xiangtao Zheng, Xiaoqiang Lu
Summary: In this letter, a deep feature reconstruction learning (DFRL) framework is proposed for the open-set classification of remote-sensing scene images (RSSIs). The proposed method combines discriminative feature learning and feature reconstruction, and effectively distinguishes known and unknown classes by feature-level reconstruction and sparse regularization.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS
(2023)
Article
Geochemistry & Geophysics
Kai Yang, Hao Sun, Chunbo Zou, Xiaoqiang Lu
Summary: This paper introduces a cross-attention spectral-spatial network (CASSN) to address the rotation issue in hyperspectral image classification, by extracting spectral and spatial features to determine pixel categories.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
(2022)