☆ 4.6 Article

DRAU: Dual Recurrent Attention Units for Visual Question Answering

COMPUTER VISION AND IMAGE UNDERSTANDING (2019)

Journal

COMPUTER VISION AND IMAGE UNDERSTANDING

Volume 185, Issue -, Pages 24-30

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.cviu.2019.05.001

Keywords

Visual Question Answering; Attention Mechanisms; Multi-modal Learning; Machine Vision; Natural Language Processing

Funding

Fraunhofer Society, Germany through the MPI-FhG collaboration project Theory and Practice for Reduced Learning Machines
German Ministry for Education and Research [01IS14013A, 01IS180371]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Visual Question Answering (VQA) requires AI models to comprehend data in two domains, vision and text. Current state-of-the-art models use learned attention mechanisms to extract relevant information from the input domains to answer a certain question. Thus, robust attention mechanisms are essential for powerful VQA models. In this paper, we propose a recurrent attention mechanism and show its benefits compared to the traditional convolutional approach. We perform two ablation studies to evaluate recurrent attention. First, we introduce a baseline VQA model with visual attention and test the performance difference between convolutional and recurrent attention on the VQA 2.0 dataset. Secondly, we design an architecture for VQA which utilizes dual (textual and visual) Recurrent Attention Units (RAUs). Using this model, we show the effect of all possible combinations of recurrent and convolutional dual attention. Our single model outperforms the first place winner on the VQA 2016 challenge and to the best of our knowledge, it is the second best performing single model on the VQA 1.0 dataset. Furthermore, our model noticeably improves upon the winner of the VQA 2017 challenge. Moreover, we experiment replacing attention mechanisms in state-of-the-art models with our RAUs and show increased performance.

DRAU: Dual Recurrent Attention Units for Visual Question Answering

Journal

COMPUTER VISION AND IMAGE UNDERSTANDING

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

DRAU: Dual Recurrent Attention Units for Visual Question Answering

Journal

COMPUTER VISION AND IMAGE UNDERSTANDING

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper