4.6 Article

Multi-modal co-attention relation networks for visual question answering

Journal

VISUAL COMPUTER
Volume 39, Issue 11, Pages 5783-5795

Publisher

SPRINGER
DOI: 10.1007/s00371-022-02695-9

Keywords

Computer vision; Visual question answering; Co-attention; Visual object relation reasoning

Ask authors/readers for more resources

In this article, a Multi-Modal Co-Attention Relation Network (MCARN) is proposed to address the problem of VQA models only modeling object-level visual representations and neglecting the relationships between visual objects. The MCARN can model visual representations at both object and relation levels, and the stacking of its visual relation reasoning module improves the accuracy on Number questions. Additionally, two models, RGF-CA and Cos-Sin+CA, are introduced which achieve excellent comprehensive performance and higher accuracy on Other questions respectively by combining co-attention with relative geometry features of visual objects.
The current mainstream visual question answering (VQA) models only model the object-level visual representations but ignore the relationships between visual objects. To solve this problem, we propose a Multi-Modal Co-Attention Relation Network (MCARN) that combines co-attention and visual object relation reasoning. MCARN can model visual representations at both object-level and relation-level, and stacking its visual relation reasoning module can further improve the accuracy of the model on Number questions. Inspired by MCARN, we propose two models, RGF-CA and Cos-Sin+CA, which combine co-attention with the relative geometry features of visual objects, and achieve excellent comprehensive performance and higher accuracy on Other questions respectively. Extensive experiments and ablation studies based on the benchmark dataset VQA 2.0 prove the effectiveness of our models, and also verify the synergy of co-attention and visual object relation reasoning in VQA task.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available