☆ 4.5 Article

Sparse co-attention visual question answering networks based on thresholds

APPLIED INTELLIGENCE (2023)

Journal

APPLIED INTELLIGENCE

Volume 53, Issue 1, Pages 586-600

Publisher

SPRINGER

DOI: 10.1007/s10489-022-03559-4

Keywords

Visual question answering; Sparse co-attention; Attention score; Threshold

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The paper proposes a Sparse Co-Attention Visual Question Answering Network (SCAVQAN) based on thresholds to improve performance by concentrating model attention. Experimental results on two benchmark VQA datasets demonstrate the effectiveness and interpretability of the model.

Most existing visual question answering (VQA) models choose to model the dense interactions between each image region and each question word when learning the co-attention between the input images and the input questions. However, to correctly answer a natural language question related to the content of an image usually only requires understanding a few key words of the input question and capturing the visual information contained in a few regions of the input image. The noise information generated by the interactions between the image regions unrelated to the input questions and the question words unrelated to the prediction of the correct answers will distract VQA models and negatively affect the performance of the models. In this paper, to solve this problem, we propose a Sparse Co-Attention Visual Question Answering Network (SCAVQAN) based on thresholds. SCAVQAN concentrates the attention of the model by setting thresholds for attention scores to filter out the image features and the question features that are the most helpful for predicting the correct answers and finally improves the overall performance of the model. Experimental results, ablation studies and attention visualization results based on two benchmark VQA datasets demonstrate the effectiveness and interpretability of our models.

Sparse co-attention visual question answering networks based on thresholds

Journal

APPLIED INTELLIGENCE

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Sparse co-attention visual question answering networks based on thresholds

Journal

APPLIED INTELLIGENCE

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper