4.7 Article

Explaining VQA predictions using visual grounding and a knowledge base

期刊

IMAGE AND VISION COMPUTING
卷 101, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.imavis.2020.103968

关键词

Deep Learning; Attention; Supervision; Knowledge Base; Interpretability; Explainability

资金

  1. Fondecyt Grant, Chile [1181739]
  2. Millennium Institute for Foundational Research on Data, Chile

向作者/读者索取更多资源

In this work, we focus on the Visual Question Answering (VQA) task, where a model must answer a question based on an image, and the VQA-Explanations task, where an explanation is produced to support the answer. We introduce an interpretable model capable of pointing out and consuming information from a novel Knowledge Base (KB) composed of real-world relationships between objects, along with labels mined from available region descriptions and object annotations. Furthermore, this model provides a visual and textual explanations to complement the KB visualization. The use of a KB brings two important consequences: enhance predictions and improve interpretability. We achieve this by introducing a mechanism that can extract relevant information from this KB, and can point out the relations better suited for predicting the answer. A supervised attention map is generated over the KB to select the relevant relationships from it for each question-image pair. Moreover, we add image attention supervision on the explanations module to generate better visual and textual explanations. We quantitatively show that the predicted answers improve when using the KB; similarly, explanations improve with this and when adding image attention supervision. Also, we qualitatively show that the KB attention helps to improve interpretability and enhance explanations. Overall, the results support the benefits of having multiple tasks to enhance the interpretability and performance of the model. (C) 2020 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Robotics

Socially and Contextually Aware Human Motion and Pose Forecasting

Vida Adeli, Ehsan Adeli, Ian Reid, Juan Carlos Niebles, Hamid Rezatofighi

IEEE ROBOTICS AND AUTOMATION LETTERS (2020)

Article Computer Science, Theory & Methods

A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images

Pablo Messina, Pablo Pino, Denis Parra, Alvaro Soto, Cecilia Besa, Sergio Uribe, Marcelo Andia, Cristian Tejos, Claudia Prieto, Daniel Capurro

Summary: Physicians face increasing demand for image-based diagnosis from patients every year, which can be addressed with the recent advancement in artificial intelligence. Survey of works on automatic report generation from medical images using deep neural networks shows progress in datasets, architecture design, explainability, and evaluation metrics, but challenges remain, especially in evaluating the accuracy of generated reports.

ACM COMPUTING SURVEYS (2022)

Article Radiology, Nuclear Medicine & Medical Imaging

High fidelity deep learning-based MRI reconstruction with instance-wise discriminative feature matching loss

Ke Wang, Jonathan Tamir, Alfredo De Goyeneche, Uri Wollner, Rafi Brada, Stella X. Yu, Michael Lustig

Summary: This study aims to improve the fidelity of fine structures and textures in deep learning-based reconstructions. A novel patch-based unsupervised feature loss method is proposed to preserve perceptual similarity and high-order statistics. Experimental results demonstrate that this method can produce more realistic reconstructions with finer textures, sharper edges, and improved overall image quality.

MAGNETIC RESONANCE IN MEDICINE (2022)

Article Plant Sciences

Differential gene expression analysis of the resprouting process in Pinus canariensis provides new insights into a rare trait in conifers

Victor Chano, Oliver Gailing, Carmen Collada, Alvaro Soto

Summary: Resprouting is a crucial trait in population dynamics, and Pinus canariensis is one of the few conifers species capable of resprouting. In this study, we analyzed gene expression during wound-induced resprouting in 5 years-old Canarian pines and identified key differentially expressed genes (DEGs) at different stages of resprouting. Our findings suggest similarities between lateral shoot development in gymnosperms and apical growth in flowering plants, indicating potential homologies between these processes.

PLANT GROWTH REGULATION (2023)

Article Computer Science, Artificial Intelligence

Learning Sentence-Level Representations with Predictive Coding

Vladimir Araujo, Marie-Francine Moens, Alvaro Soto

Summary: Learning sentence representations is important and challenging in deep learning and natural language processing. Previous methods focused on learning contextualized word representations, but failed to capture the structure and discourse relationships in contiguous sentences. This work improves pretrained models by applying predictive coding theory and shows consistent improvement on sentence representations for English and Spanish languages. It also demonstrates the models' ability to capture discourse and pragmatics knowledge through extensive experimentation and validation.

MACHINE LEARNING AND KNOWLEDGE EXTRACTION (2023)

Proceedings Paper Computer Science, Artificial Intelligence

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Le Xue, Mingfei Gao, Chen Xing, Roberto Martin-Martin, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, Silvio Savarese

Summary: The recognition capabilities of current state-of-the-art 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories. This study introduces ULIP, a framework that utilizes multimodal information to improve the understanding of 3D modality. ULIP is pre-trained with object triplets from image, text, and 3D point cloud and achieves state-of-the-art performance in 3D classification tasks.

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Bridging the Visual Semantic Gap in VLN via Semantically Richer Instructions

Joaquin Ossandon, Benjamin Earle, Alvaro Soto

Summary: The Visual-and-Language Navigation task involves navigating an indoor environment using only visual information based on a textual instruction. Existing AI models still struggle with this task, which is easy for humans. The authors propose that the poor utilization of visual information is the main reason for the low performance of current models. They support this hypothesis with experimental evidence, showing that state-of-the-art models are not significantly affected when given limited or no visual data, indicating overfitting to textual instructions. To address this issue, the authors introduce a new data augmentation method that incorporates more explicit visual information in the generation of textual instructions, leading to an 8% increase in performance for unseen environments.

COMPUTER VISION, ECCV 2022, PT XXXVII (2022)

Proceedings Paper Computer Science, Interdisciplinary Applications

Evaluation Benchmarks for Spanish Sentence Representations

Vladimir Araujo, Andres Carvallo, Souvik Kundu, Jose Caneteo, Marcelo Mendoza, Robert E. Mercer, Felipe Bravo-Marquez, Marie-Francine Moens, Alvaro Soto

Summary: With the success of pre-trained language models, there has been an emergence of versions in languages other than English. However, the evaluation methods for these models are limited for languages like Spanish. This paper aims to bridge this gap by introducing two evaluation benchmarks, Spanish SentEval and Spanish DiscoEval, for assessing stand-alone and discourse-aware sentence representations, respectively. The benchmarks include a variety of datasets from different domains, and the authors also evaluate and analyze the capabilities and limitations of recent pre-trained Spanish language models. The findings show that for discourse evaluation tasks, mBERT, a language model trained on multiple languages, generally outperforms models trained solely on Spanish documents. The contribution of this study is to motivate a fairer, more comparable, and less cumbersome approach to evaluating future Spanish language models.

LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (2022)

Proceedings Paper Computer Science, Artificial Intelligence

PrivHAR: Recognizing Human Actions from Privacy-Preserving Lens

Carlos Hinojosa, Miguel Marquez, Henry Arguello, Ehsan Adeli, Li Fei-Fei, Juan Carlos Niebles

Summary: This paper proposes an optimizing framework that degrades video quality to protect privacy attributes and maintain relevant features for activity recognition.

COMPUTER VISION - ECCV 2022, PT IV (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, Ran Xu, Wenhao Liu, Caiming Xiong

Summary: Despite progress in object detection, most methods are limited to a specific set of object categories. This paper proposes a method to automatically generate pseudo bounding-box annotations from image-caption pairs, expanding the base classes and improving object detection performance.

COMPUTER VISION, ECCV 2022, PT X (2022)

Proceedings Paper Computer Science, Theory & Methods

Entropy-based Stability-Plasticity for Lifelong Learning

Vladimir Araujo, Julio Hurtado, Alvaro Soto, Marie-Francine Moens

Summary: The ability of deep learning models to continuously learn is limited compared to humans. To address this issue, we propose a novel method called Entropy-based Stability-Plasticity (ESP) that dynamically determines the modification level of each model layer through a plasticity factor, reducing interference and speeding up training.

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022 (2022)

Proceedings Paper Business

DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

Cristobal Eyzaguirre, Felipe del Rio, Vladimir Araujo, Alvaro Soto

Summary: This paper proposes DACT-BERT, a differentiable adaptive computation time strategy for BERT-like models, which controls the number of Transformer blocks that need to be executed at inference time. Experimental results demonstrate that the proposed approach performs well on a reduced computational regime and is competitive in other cases.

PROCEEDINGS OF THE FIRST WORKSHOP ON EFFICIENT BENCHMARKING IN NLP (NLP POWER 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

Vladimir Araujo, Andres Villa, Marcelo Mendoza, Marie-Francine Moens, Alvaro Soto

Summary: The study proposes using ideas from predictive coding theory to augment language models and improve performance in discourse relationship detection by learning suitable discourse-level representations.

2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) (2021)

Article Computer Science, Information Systems

Overcoming Catastrophic Forgetting Using Sparse Coding and Meta Learning

Julio Hurtado, Hans Lobel, Alvaro Soto

Summary: This study presents two strategies to address the task interference problem in deep learning, one using sparse coding technique to adaptively allocate model capacity to avoid interference, and the other using meta learning technique to encourage knowledge transfer among tasks.

IEEE ACCESS (2021)

暂无数据