4.7 Article

AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering

Journal

KNOWLEDGE-BASED SYSTEMS
Volume 255, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.knosys.2022.109763

Keywords

Attention mechanism; Deep learning; Medical Visual Question Answering; Multimodal fusion; Medical images

Funding

  1. National Natural Science Foundation of China [62072135]
  2. Inter national Exchange Program of Harbin Engineering University

Ask authors/readers for more resources

This paper proposes an Attention-based Multimodal Alignment Model (AMAM) for medical Visual Question Answering (VQA), aiming to enrich textual features by aligning text-based and image-based attention. The AMAM incorporates attention mechanisms and a composite loss to obtain rich semantic textual information and accurate answers.
Medical Visual Question Answering (VQA) is a multimodal task to answer clinical questions about medical images. Existing methods have achieved good performance, but most medical VQA models focus on visual contents while ignoring the influence of textual contents. To address this issue, this paper proposes an Attention-based Multimodal Alignment Model (AMAM) for medical VQA, aiming for an alignment of text-based and image-based attention to enrich the textual features. First, we develop an Image-to-Question (I2Q) attention and a Word-to-Question (W2Q) attention to model the relations of both visual and textual contents to the question. Second, we design a composite loss composed of a classification loss and an Image-Question Complementary (IQC) loss. The IQC loss concentrates on aligning the importance of the questions learned from visual and textual features to emphasize meaningful words in questions and improve the quality of predicted answers. Benefiting from the attention mechanisms and the composite loss, AMAM obtains rich semantic textual information and accurate answers. Finally, due to some data errors and missing labels on the VQA-RAD dataset, we further constructed an enhanced dataset, VQA-RADPh, to raise data quality. Experimental results on public datasets show better performance of AMAM compared with the advanced methods. Our source code is available at: https://github.com/shuning-ai/AMAM/tree/master.(c) 2022 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available