☆ 4.2 Article

Knowledge-driven understanding of images in comic books

INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION (2015)

期刊

INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION

卷 18, 期 3, 页码 199-221

出版社

SPRINGER HEIDELBERG

DOI: 10.1007/s10032-015-0243-1

关键词

Document understanding; Comics analysis; Expert system

类别

Computer Science, Artificial Intelligence

资金

European Doctorate scholarship of the University of La Rochelle
European Regional Development Fund
region Poitou-Charentes (France)
General Council of Charente Maritime (France)
municipality of La Rochelle (France)
Spanish research projects [TIN2011-24631, RYC-2009-05031]

向作者/读者索取更多资源

Protocol

Reagent

摘要

Document analysis is an active field of research, which can attain a complete understanding of the semantics of a given document. One example of the document understanding process is enabling a computer to identify the key elements of a comic book story and arrange them according to a predefined domain knowledge. In this study, we propose a knowledge-driven system that can interact with bottom-up and top-down information to progressively understand the content of a document. We model the comic book's and the image processing domains knowledge for information consistency analysis. In addition, different image processing methods are improved or developed to extract panels, balloons, tails, texts, comic characters and their semantic relations in an unsupervised way.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Computer Science, Information Systems

BINYAS: a complex document layout analysis system

Showmik Bhowmik, Soumyadeep Kundu, Ram Sarkar

Summary: Document layout analysis (DLA) is essential for developing a comprehensive document image processing system, aiming to segment document images and identify different regions. The proposed BINYAS system, based on connected components and pixel analysis, outperforms existing methods based on evaluations on four standard datasets.

MULTIMEDIA TOOLS AND APPLICATIONS (2021)

添加到收藏夹

Article Medicine, Legal

Biasability and reliability of expert forensic document examiners

Itiel E. Dror, Kyle C. Scherr, Linton A. Mohammed, Carla. L. MacLean, Lloyd Cunningham

Summary: This study explored the judgments of practicing forensic document experts and found that their judgments were not biased by the nature of the case, possibly due to the fact that document examiners do not primarily work within an organizational forensic laboratory culture, leading to a lack of consistency.

FORENSIC SCIENCE INTERNATIONAL (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Why is a document relevant? Understanding the relevance scores in cross-lingual document retrieval

Erik Novak, Luka Bizjak, Dunja Mladenic, Marko Grobelnik

Summary: This paper proposes a novel learning-to-rank model named LM-EMD that utilizes a multilingual BERT language model and Earth Mover's Distance (EMD) to measure the relevancy between a document and an input query. The model provides interpretable insights by analyzing the distances and identifying the contributing document tokens to the relevancy.

KNOWLEDGE-BASED SYSTEMS (2022)

添加到收藏夹

Article Communication

Expert-Informed Topic Models for Document Set Discovery

Eike Mark Rinke, Timo Dobbrick, Charlotte Loeb, Cacilia Zirn, Hartmut Wessler

Summary: In text-as-data studies, expert-informed topic modeling (EITM) is proposed as a flexible and efficient approach to help researchers identify and select subsets of documents addressing specific topics within large text corpora by combining external domain knowledge and probabilistic topic models.

COMMUNICATION METHODS AND MEASURES (2022)

添加到收藏夹

Article Automation & Control Systems

LayoutQT-Layout Quadrant Tags to embed visual features for document analysis

Patricia Medyna Lauritzen de Lucena Drumond, Lindeberg Pessoa Leite, Teofilo E. de Campos, Fabricio Ataides Braz

Summary: The relative position of text blocks is crucial in document understanding, however, embedding layout information in a page instance representation is not easy. We introduce a new method called Layout Quadrant Tags (LayoutQT) to encode layout information in textual embedding, enhancing NLP pipelines without expensive multimodal fusion. Our experiments with AWD-LSTM neural network on Tobacco800 and RVL-CDIP datasets show significant improvement in page stream segmentation and document classification, achieving higher F1 scores.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2023)

添加到收藏夹

Article Engineering, Industrial

Expert judgment-based reliability analysis of the Dutch flood defense system

G. Rongen, O. Morales-Napoles, M. Kok

Summary: This study aims to assess the failure probabilities of Dutch dikes and compare them to model results through expert estimation. The research demonstrates that structured expert judgments can be successfully used for estimating the reliability of Dutch flood defenses, despite the presence of uncertainties and overestimated failure probabilities.

RELIABILITY ENGINEERING & SYSTEM SAFETY (2022)

添加到收藏夹

Article Mathematical & Computational Biology

Engineering Education Understanding Expert Decision System Research and Application

Huajie Ye, Cuifeng Li

Summary: Engineering education is based on technical science and aims to train engineers who can utilize science and technology for productive purposes. In recent years, the emergence of new technological advancements has brought about new challenges to engineering education. To meet these challenges, a shift in educational philosophy is necessary, along with a proper understanding and management of various aspects in engineering education. This study introduces the concept of engineering education certification in the context of new infrastructure and explores reform from different perspectives. Experimental analysis shows positive results in the proposed method.

COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE (2022)

添加到收藏夹

Article Endocrinology & Metabolism

Management of RANKL-mediated Disorders With Denosumab in Children and Adolescents: A Global Expert Guidance Document

Joel A. Vanderniet, Vivian Szymczuk, Wolfgang Hogler, Signe S. Beck-Nielsen, Suma Uday, Nadia Merchant, Janet L. Crane, Leanne M. Ward, Alison M. Boyce, Craig F. Munns

Summary: Denosumab is an effective treatment for RANKL-mediated disorders in children and adolescents, although it is not curative and may be used in combination with surgical or other medical treatments. Multidisciplinary planning and expert oversight are necessary to manage the risk of mineral abnormalities. More research is needed to determine optimal treatment regimens and minimize risks.

JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM (2023)

添加到收藏夹

Article Environmental Studies

Understanding cultural ecosystem services related to farmlands: Expert survey in Europe

Agnes Balazsi, Juliana Danhardt, Sue Collins, Oliver Schweiger, Josef Settele, Tibor Hartel

Summary: Cultural ecosystem services (CES) are nonmaterial benefits obtained from ecosystems, covering a wide range of domains. European agricultural landscapes are complex social-ecological systems where synergies and trade-offs between production and conservation determine CES values. Experts believe that interdisciplinary approaches and integrative science-policy methodologies are promising to improve CES approach for policy and management, but practical implementation in policies targeting agricultural landscapes still lags behind.

LAND USE POLICY (2021)

添加到收藏夹

Article Geosciences, Multidisciplinary

Understanding heat vulnerability in the subtropics: Insights from expert judgements

Wan-Yu Shih, Leslie Mabon

Summary: The risk to health from extreme heat has gained attention in scholarship and policy, with demographic and socio-economic factors influencing an individual's susceptibility to extreme heat. Many countries still rely on expert judgments for heat vulnerability assessment, which may not always be evidence-informed and can be influenced by the experts involved.

INTERNATIONAL JOURNAL OF DISASTER RISK REDUCTION (2021)

添加到收藏夹

Article Chemistry, Analytical

An Expert System for Rotating Machine Fault Detection Using Vibration Signal Analysis

Ayaz Kafeel, Sumair Aziz, Muhammad Awais, Muhammad Attique Khan, Kamran Afaq, Sahar Ahmed Idris, Hammam Alshazly, Samih M. Mostafa

Summary: Accurate and early detection of machine faults is crucial for industrial preventive maintenance to avoid unexpected downtime and ensure equipment reliability and human safety. This study presents a fault detection system for rotating machines using vibration signal analysis, achieving high accuracy with a hybrid combination of time and spectral features classified by support vector machines.

SENSORS (2021)

添加到收藏夹

Article Sport Sciences

Understanding the gut instinct of expert coaches during talent identification

Alexandra H. Roberts, Daniel Greenwood, Mandy Stanley, Clare Humberstone, Fiona Iredale, Annette Raynor

Summary: Coaches primarily rely on intuition in talent identification in sports, which is formed through years of experience, time spent with athletes, and decision context. When selecting athletes, coaches may be more inclined to consider their own ability to improve certain athletes.

JOURNAL OF SPORTS SCIENCES (2021)

添加到收藏夹

Review Cardiac & Cardiovascular Systems

Joint EAPCI/ACVC expert consensus document on percutaneous ventricular assist devices

Alaide Chieffo, Dariusz Dudek, Christian Hassager, Alain Combes, Mario Gramegna, Sigrun Halvorsen, Kurt Huber, Vijay Kunadian, Jiri Maly, Jacob Eifer Moller, Federico Pappalardo, Giuseppe Tarantini, Guido Tavazzi, Holger Thiele, Christophe Vandenbriele, Nicolas van Mieghem, Pascal Vranckx, Nikos Werner, Susanna Price

Summary: This consensus document summarizes the expert panel's views on the use of short-term percutaneous ventricular assist devices (pVADs) in various clinical settings. pVADs differ in their hemodynamic effects, management, and indications, requiring guidance based on existing evidence and best current practice.

EUROPEAN HEART JOURNAL-ACUTE CARDIOVASCULAR CARE (2021)

添加到收藏夹

Article Computer Science, Hardware & Architecture

Design and implementation of an academic expert system through big data analysis

Dojin Choi, Hyeonbyeong Lee, Kyoungsoo Bok, Jaesoo Yoo

Summary: Researchers establish research directions in new fields through expert advice or papers, but lack expert search services. This paper presents an expert search system based on published papers, calculating expert scores to support researchers' activities.

JOURNAL OF SUPERCOMPUTING (2021)

添加到收藏夹

Article Computer Science, Information Systems

Toward Understanding Most of the Context in Document-Level Neural Machine Translation

Gyu-Hyeon Choi, Jong-Hun Shin, Yo-Han Lee, Young-Kil Kim

Summary: Considerable research has been conducted to improve translation performance by capturing contextual correlation at the document level. The proposed method shows improved translation performance in various translation tasks and benchmark machine translation tasks compared to the state-of-the-art baseline.

ELECTRONICS (2022)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Real-time Lexicon-free Scene Text Retrieval

Andres Mafla, Ruben Tito, Sounak Dey, Lluis Gomez, Marcal Rusinol, Ernest Valveny, Dimosthenis Karatzas

Summary: In this study, the task of scene text retrieval is addressed by proposing a single shot CNN architecture for predicting bounding boxes and building compact representations of spotted words. Experimental results demonstrate that the proposed model outperforms previous state-of-the-art while offering significant increase in processing speed and unmatched expressiveness.

PATTERN RECOGNITION (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Asking questions on handwritten document collections

Minesh Mathew, Lluis Gomez, Dimosthenis Karatzas, C. Jawahar

Summary: This work focuses on Question Answering on handwritten document collections, proposing an approach that does not require text recognition. By projecting textual words and word images into a common sub-space, the proposed method can retrieve document snippets potentially containing answers. Results suggest that this approach is suitable for handwritten documents and historical collections.

INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION (2021)

添加到收藏夹

Article Computer Science, Artificial Intelligence

Multimodal grid features and cell pointers for scene text visual question answering

Lluis Gomez, Ali Furkan Biten, Ruben Tito, Andres Mafla, Marcal Rusinol, Ernest Valveny, Dimosthenis Karatzas

Summary: The paper introduces a new model for scene text visual question answering which is based on a single attention mechanism and demonstrates competitive performance in two standard datasets. Experimental results show that the model is x5 faster than previous methods at inference time.

PATTERN RECOGNITION LETTERS (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Read While You Drive - Multilingual Text Tracking on the Road

Sergi Garcia-Bordils, George Tom, Sangeeth Reddy, Minesh Mathew, Marcal Rusinol, C. Jawahar, Dimosthenis Karatzas

Summary: This paper presents RoadText-3K, a large driving video dataset with fully annotated text, which is three times bigger than its predecessor and contains data from varied geographical locations, unconstrained driving conditions, and multiple languages and scripts. The article also offers a comprehensive analysis of the limitations of state-of-the-art text detection methods and proposes a new tracking model that achieves state-of-the-art results.

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

A Multilingual Approach to Scene Text Visual Question Answering

Josep Brugues i Pujolras, Llufs Gomez i Bigorda, Dimosthenis Karatzas

Summary: Scene Text Visual Question Answering (ST-VQA) is a hot research topic in Computer Vision. Current models have limited performance on multiple languages. This study explores the possibility of obtaining bilingual and multilingual VQA models and demonstrates the performance improvement by using multilingual word embeddings during training.

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Infographic VQA

Minesh Mathew, Viraj Bagal, Ruben Tito, Dimosthenis Karatzas, Ernest Valveny, C. Jawahar

Summary: This work explores the automatic understanding of infographic images using a Visual Question Answering technique, and presents a diverse dataset called InfographicVQA. The dataset requires methods to reason over document layout, textual content, graphical elements, and data visualizations. Two Transformer-based baselines are evaluated, but they do not perform as well as humans on the dataset. The study suggests that VQA on infographics can serve as a benchmark for evaluating machine understanding of complex document images.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornes, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, Josep Llados

Summary: This paper addresses the challenge of low-resource Handwritten Text Recognition (HTR) by proposing a data generation technique based on Bayesian Program Learning (BPL). Unlike traditional methods, which require a large amount of annotated images, our method can generate human-like handwriting using only one sample of each symbol in the alphabet. Synthetic lines are then created to train state-of-the-art HTR architectures in a segmentation-free fashion. Quantitative and qualitative analyses confirm the effectiveness of the proposed method.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Summary: The article discusses object bias (hallucination) in image captioning and presents three simple yet efficient training augmentation methods to reduce it without the need for new data or increased model size. The proposed methods are shown to significantly decrease object bias in the models based on hallucination metrics, and reduce dependency on visual features through experimental demonstration. All code, configuration files, and model weights are available online.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Ali Furkan Biten, Andres Mafla, Lluis Gomez, Dimosthenis Karatzas

Summary: The existing datasets for image-text matching task lack the ability to accurately measure semantic relevance. This study proposes two metrics to evaluate the semantic relevance of image-text pairs and introduces a new strategy to improve model performance. Experiments show significant improvements in scenarios with limited training data.

2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) (2022)

添加到收藏夹

Proceedings Paper Computer Science, Information Systems

Document Collection Visual Question Answering

Ruben Tito, Dimosthenis Karatzas, Ernest Valveny

Summary: Current methods in Document Understanding focus on processing individual documents, while documents are typically organized in collections which provide valuable context for interpretation. To address this issue, DocCVQA introduces a new dataset and task where questions are posed over a whole collection of document images, aiming to provide answers to questions and retrieve the documents containing relevant information. Along with the dataset, a new evaluation metric and baselines are proposed to gain further insights into this new dataset and task.

DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

ICDAR 2021 Competition on Document Visual Question Answering

Ruben Tito, Minesh Mathew, C. Jawahar, Ernest Valveny, Dimosthenis Karatzas

Summary: The report presents the results of the ICDAR 2021 edition of the Document Visual Question Challenges, including tasks on Infographics VQA and previous tasks. The winning methods performed differently in each task, with the lowest score in the Infographics VQA task. The report also provides detailed descriptions of the datasets used, submitted methods, performance analysis, and progress made in Single Document VQA since 2020.

DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

StacMR: Scene-Text Aware Cross-Modal Retrieval

Andres Mafla, Rafael S. Rezende, Lluis Gomez, Diane Larlus, Dimosthenis Karatzas

Summary: This paper introduces a new dataset for cross-modal retrieval involving scene-text instances, proposes approaches leveraging scene text, and conducts experiments to confirm the benefits of utilizing scene text.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

Summary: By leveraging multi-modal content in the form of visual and textual cues, this study significantly improved the performance of fine-grained image classification and retrieval tasks. The model obtained relationship-enhanced features by learning a common semantic space between salient objects and text found in an image, outperforming previous state-of-the-art in two different tasks.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

DocVQA: A Dataset for VQA on Document Images

Minesh Mathew, Dimosthenis Karatzas, C. Jawahar

Summary: DocVQA is a new dataset for Visual Question Answering on document images, with 50,000 questions defined on 12,000+ images. Analysis shows that existing models perform reasonably well on certain question types, but there is still a large performance gap compared to human performance. Models need to improve on questions where understanding the structure of the document is crucial.

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

Text Recognition - Real World Data and Where to Find Them

Klara Janouskova, Jiri Matas, Lluis Gomez, Dimosthenis Karatzas

Summary: The method proposed leverages weakly annotated images to enhance text extraction pipelines, by combining imprecise text transcriptions with weak annotations to generate nearly error-free instances of scene text for training, resulting in consistent improvements in accuracy for state-of-the-art recognition models.

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) (2021)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.