4.5 Article

Sparse co-attention visual question answering networks based on thresholds

Journal

APPLIED INTELLIGENCE
Volume 53, Issue 1, Pages 586-600

Publisher

SPRINGER
DOI: 10.1007/s10489-022-03559-4

Keywords

Visual question answering; Sparse co-attention; Attention score; Threshold

Ask authors/readers for more resources

The paper proposes a Sparse Co-Attention Visual Question Answering Network (SCAVQAN) based on thresholds to improve performance by concentrating model attention. Experimental results on two benchmark VQA datasets demonstrate the effectiveness and interpretability of the model.
Most existing visual question answering (VQA) models choose to model the dense interactions between each image region and each question word when learning the co-attention between the input images and the input questions. However, to correctly answer a natural language question related to the content of an image usually only requires understanding a few key words of the input question and capturing the visual information contained in a few regions of the input image. The noise information generated by the interactions between the image regions unrelated to the input questions and the question words unrelated to the prediction of the correct answers will distract VQA models and negatively affect the performance of the models. In this paper, to solve this problem, we propose a Sparse Co-Attention Visual Question Answering Network (SCAVQAN) based on thresholds. SCAVQAN concentrates the attention of the model by setting thresholds for attention scores to filter out the image features and the question features that are the most helpful for predicting the correct answers and finally improves the overall performance of the model. Experimental results, ablation studies and attention visualization results based on two benchmark VQA datasets demonstrate the effectiveness and interpretability of our models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available