4.2 Article

Knowledge-driven understanding of images in comic books

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s10032-015-0243-1

关键词

Document understanding; Comics analysis; Expert system

资金

  1. European Doctorate scholarship of the University of La Rochelle
  2. European Regional Development Fund
  3. region Poitou-Charentes (France)
  4. General Council of Charente Maritime (France)
  5. municipality of La Rochelle (France)
  6. Spanish research projects [TIN2011-24631, RYC-2009-05031]

向作者/读者索取更多资源

Document analysis is an active field of research, which can attain a complete understanding of the semantics of a given document. One example of the document understanding process is enabling a computer to identify the key elements of a comic book story and arrange them according to a predefined domain knowledge. In this study, we propose a knowledge-driven system that can interact with bottom-up and top-down information to progressively understand the content of a document. We model the comic book's and the image processing domains knowledge for information consistency analysis. In addition, different image processing methods are improved or developed to extract panels, balloons, tails, texts, comic characters and their semantic relations in an unsupervised way.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Artificial Intelligence

Real-time Lexicon-free Scene Text Retrieval

Andres Mafla, Ruben Tito, Sounak Dey, Lluis Gomez, Marcal Rusinol, Ernest Valveny, Dimosthenis Karatzas

Summary: In this study, the task of scene text retrieval is addressed by proposing a single shot CNN architecture for predicting bounding boxes and building compact representations of spotted words. Experimental results demonstrate that the proposed model outperforms previous state-of-the-art while offering significant increase in processing speed and unmatched expressiveness.

PATTERN RECOGNITION (2021)

Article Computer Science, Artificial Intelligence

Asking questions on handwritten document collections

Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, C. Jawahar

Summary: This work focuses on Question Answering on handwritten document collections, proposing an approach that does not require text recognition. By projecting textual words and word images into a common sub-space, the proposed method can retrieve document snippets potentially containing answers. Results suggest that this approach is suitable for handwritten documents and historical collections.

INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION (2021)

Article Computer Science, Artificial Intelligence

Multimodal grid features and cell pointers for scene text visual question answering

Lluis Gomez, Ali Furkan Biten, Ruben Tito, Andres Mafla, Marcal Rusinol, Ernest Valveny, Dimosthenis Karatzas

Summary: The paper introduces a new model for scene text visual question answering which is based on a single attention mechanism and demonstrates competitive performance in two standard datasets. Experimental results show that the model is x5 faster than previous methods at inference time.

PATTERN RECOGNITION LETTERS (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Read While You Drive - Multilingual Text Tracking on the Road

Sergi Garcia-Bordils, George Tom, Sangeeth Reddy, Minesh Mathew, Marcal Rusinol, C. Jawahar, Dimosthenis Karatzas

Summary: This paper presents RoadText-3K, a large driving video dataset with fully annotated text, which is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions, and multiple languages and scripts. The article also offers a comprehensive analysis of the limitations of state-of-the-art text detection methods and proposes a new tracking model that achieves state-of-the-art results.

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

A Multilingual Approach to Scene Text Visual Question Answering

Josep Brugues i Pujolras, Llufs Gomez i Bigorda, Dimosthenis Karatzas

Summary: Scene Text Visual Question Answering (ST-VQA) is a hot research topic in Computer Vision. Current models have limited performance on multiple languages. This study explores the possibility of obtaining bilingual and multilingual VQA models and demonstrates the performance improvement by using multilingual word embeddings during training.

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Infographic VQA

Minesh Mathew, Viraj Bagal, Ruben Tito, Dimosthenis Karatzas, Ernest Valveny, C. Jawahar

Summary: This work explores the automatic understanding of infographic images using a Visual Question Answering technique, and presents a diverse dataset called InfographicVQA. The dataset requires methods to reason over document layout, textual content, graphical elements, and data visualizations. Two Transformer-based baselines are evaluated, but they do not perform as well as humans on the dataset. The study suggests that VQA on infographics can serve as a benchmark for evaluating machine understanding of complex document images.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornes, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Llados

Summary: This paper addresses the challenge of low-resource Handwritten Text Recognition (HTR) by proposing a data generation technique based on Bayesian Program Learning (BPL). Unlike traditional methods, which require a large amount of annotated images, our method can generate human-like handwriting using only one sample of each symbol in the alphabet. Synthetic lines are then created to train state-of-the-art HTR architectures in a segmentation-free fashion. Quantitative and qualitative analyses confirm the effectiveness of the proposed method.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Summary: The article discusses object bias (hallucination) in image captioning and presents three simple yet efficient training augmentation methods to reduce it without the need for new data or increased model size. The proposed methods are shown to significantly decrease object bias in the models based on hallucination metrics, and reduce dependency on visual features through experimental demonstration. All code, configuration files, and model weights are available online.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

Summary: The existing datasets for image-text matching task lack the ability to accurately measure semantic relevance. This study proposes two metrics to evaluate the semantic relevance of image-text pairs and introduces a new strategy to improve model performance. Experiments show significant improvements in scenarios with limited training data.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

Proceedings Paper Computer Science, Information Systems

Document Collection Visual Question Answering

Ruben Tito, Dimosthenis Karatzas, Ernest Valveny

Summary: Current methods in Document Understanding focus on processing individual documents, while documents are typically organized in collections which provide valuable context for interpretation. To address this issue, DocCVQA introduces a new dataset and task where questions are posed over a whole collection of document images, aiming to provide answers to questions and retrieve the documents containing relevant information. Along with the dataset, a new evaluation metric and baselines are proposed to gain further insights into this new dataset and task.

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II (2021)

Proceedings Paper Computer Science, Artificial Intelligence

ICDAR 2021 Competition on Document Visual Question Answering

Ruben Tito, Minesh Mathew, C. Jawahar, Ernest Valveny, Dimosthenis Karatzas

Summary: The report presents the results of the ICDAR 2021 edition of the Document Visual Question Challenges, including tasks on Infographics VQA and previous tasks. The winning methods performed differently in each task, with the lowest score in the Infographics VQA task. The report also provides detailed descriptions of the datasets used, submitted methods, performance analysis, and progress made in Single Document VQA since 2020.

DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV (2021)

Proceedings Paper Computer Science, Artificial Intelligence

StacMR: Scene-Text Aware Cross-Modal Retrieval

Andres Mafla, Rafael S. Rezende, Lluis Gomez, Diane Larlus, Dimosthenis Karatzas

Summary: This paper introduces a new dataset for cross-modal retrieval involving scene-text instances, proposes approaches leveraging scene text, and conducts experiments to confirm the benefits of utilizing scene text.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Summary: By leveraging multi-modal content in the form of visual and textual cues, this study significantly improved the performance of fine-grained image classification and retrieval tasks. The model obtained relationship-enhanced features by learning a common semantic space between salient objects and text found in an image, outperforming previous state-of-the-art in two different tasks.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

DocVQA: A Dataset for VQA on Document Images

Minesh Mathew, Dimosthenis Karatzas, C. Jawahar

Summary: DocVQA is a new dataset for Visual Question Answering on document images, with 50,000 questions defined on 12,000+ images. Analysis shows that existing models perform reasonably well on certain question types, but there is still a large performance gap compared to human performance. Models need to improve on questions where understanding the structure of the document is crucial.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

Text Recognition - Real World Data and Where to Find Them

Klara Janouskova, Jiri Matas, Lluis Gomez, Dimosthenis Karatzas

Summary: The method proposed leverages weakly annotated images to enhance text extraction pipelines, by combining imprecise text transcriptions with weak annotations to generate nearly error-free instances of scene text for training, resulting in consistent improvements in accuracy for state-of-the-art recognition models.

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) (2021)

暂无数据