4.7 Article

A survey of methods, datasets and evaluation metrics for visual question answering

Journal

IMAGE AND VISION COMPUTING
Volume 116, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.imavis.2021.104327

Keywords

Computer vision; Natural language processing; Deep neural networks; World knowledge; Attention

Ask authors/readers for more resources

Visual Question Answering (VQA) is a challenging research problem that combines computer vision and natural language processing. Researchers need to leverage common sense reasoning, image information, and world knowledge to provide accurate answers. In addition to traditional models, new VQA models and evaluation metrics are continuously being developed to improve performance.
Visual Question Answering (VQA) is a multi-disciplinary research problem that has captured the attention of both computer vision as well as natural language processing researchers. In Visual Question Answering, a system is given an image; a question in a natural language related to that image as an input, and the VQA system is required to give an answer in natural language as an output. A VQA algorithm may require common sense reasoning over the information contained in the image and world knowledge to produce the right answer. In this paper, we have discussed some of the core concepts used in VQA systems and present a comprehensive survey of efforts in the past to address this problem. Apart from traditional VQA models, we have also discussed visual question answering models that require reading texts present in images and evaluated on recently developed datasets like TextVQA, ST-VQA, and OCR-VQA. Apart from standard datasets discussed in previous surveys, we have also discussed some new datasets developed in 2019 and 2020 such as GQA, OK-VQA, TextVQA, ST-VQA, and OCR-VQA. The new evaluation metrics such as BLEU, MPT, METEOR, Average Normalized Levenshtein Similarity (ANLS), Validity, Plausibility, Distribution, Consistency, Grounding, F1-Score are explained together with the evaluation metrics discussed by previous surveys. We conclude our survey with a discussion on open issues in each phase of the VQA task and present some promising future directions. (c) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available