4.5 Article

Research on Visual Question Answering Based on GAT Relational Reasoning

Journal

NEURAL PROCESSING LETTERS
Volume 54, Issue 2, Pages 1435-1448

Publisher

SPRINGER
DOI: 10.1007/s11063-021-10689-2

Keywords

Visual question answering; Relational reasoning; Attention mechanism; Graph attention network; Multi-model feature fusion

Funding

  1. NSFC [62076200, 2020JM-468]
  2. Shaanxi Natural Science Foundation

Ask authors/readers for more resources

In this paper, a Graph Attention Network Relational Reasoning (GAT2R) model is proposed to address the challenges brought by the diversity of questions in VQA. The model includes scene graph generation and scene graph answer prediction to extract spatial features and predict relations between objects. Experimental results show that the proposed model achieves improved accuracy on GQA and VQA2.0 datasets, validating its effectiveness and generalization.
Due to the diversity of questions in VQA, it brings new challenges to the construction of VQA model. Existing VQA models focus on constructing a new attention mechanism, which makes the model increasingly complex. In addition, most of them concentrate on object recognition, ignore the research on spatial reasoning, semantic relations and even scene understanding. Therefore, a Graph Attention Network Relational Reasoning (GAT2R) model is proposed in this paper, which mainly includes scene graph generation and scene graph answer prediction. The scene map generation module mainly extracts the regional and spatial features of objects through the object detection model, and uses the relation decoder to predict the relations between object pairs. The scene graph answer prediction dynamically updates the node representation through the question-guided graph attention network, then performs multi-modal fusion with the question features, the answer is obtained finally. The experimental result shows that the accuracy of the proposed model is 54.45% on the natural scene dataset GQA, which is mainly based on relational reasoning. The experimental result is 68.04% on the widely used VQA2.0 dataset. Compared with the benchmark model, the accuracy of the proposed model is increased by 4.71% and 2.37% on GQA and VQA2.0 respectively, which proves the effectiveness and generalization of the model.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available