4.6 Article

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

Journal

BMC BIOINFORMATICS
Volume 16, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s12859-015-0564-6

Keywords

BIOASQ Competition; Hierarchical Text Classification; Semantic indexing; Information retrieval; Passage retrieval; Question answering; Multi-document text summarization

Funding

  1. European Commission's Seventh Framework Programme (FP7) [318652]

Ask authors/readers for more resources

Background: This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. Results: The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. Conclusions: A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central; produce exact and paragraph-sized ideal answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the ideal answers; hence, they produced high quality summaries as answers. Overall, BIOASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Artificial Intelligence

Deception detection in text and its relation to the cultural dimension of individualism/collectivism

Katerina Papantoniou, Panagiotis Papadakos, Theodore Patkos, George Flouris, Ion Androutsopoulos, Dimitris Plexousakis

Summary: This research focuses on automatic deception detection in cross-cultural text, examining the impact of cultural differences on linguistic cues of deception from the individualism/collectivism dimension. The results indicate that the task is complex and demanding.

NATURAL LANGUAGE ENGINEERING (2022)

Article Genetics & Heredity

Identification of 4 novel human ocular coloboma genes ANK3, BMPR1B, PDGFRA, and CDH4 through evolutionary conserved vertebrate gene analysis

Nicholas Owen, Maria Toms, Rodrigo M. Young, Jonathan Eintracht, Hajrah Sarkar, Brian P. Brooks, Mariya Moosajee

Summary: In this study, new potential causative genes for ocular coloboma were identified using cross-species comparative meta-analysis. Through in silico analysis, in situ hybridization, gene knockdown, and rescue experiments, several differentially expressed genes were confirmed to be involved in the development of the optic fissure. Furthermore, novel pathogenic variants in four genes were identified in coloboma families. The findings demonstrate the utility of cross-species meta-analysis and provide insights into the genetic basis of ocular coloboma.

GENETICS IN MEDICINE (2022)

Article Multidisciplinary Sciences

Genotype-phenotype correlations for COL4A3-COL4A5 variants resulting in Gly substitutions in Alport syndrome

Joel T. Gibson, Mary Huang, Marina Shenelli Croos Dabrera, Krushnam Shukla, Hansjorg Rothe, Pascale Hilbert, Constantinos Deltas, Helen Storey, Beata S. Lipska-Zietkiewicz, Melanie M. Y. Chan, Omid Sadeghi-Alavijeh, Daniel P. Gale, Agne Cerkauskaite, Judy Savige

Summary: This study examined the molecular characteristics of Gly substitutions in Alport syndrome and found that the stability of the substitution and its interaction with nearby residues determine the risk of haematuria, early onset kidney failure, and hearing loss in this inherited kidney disease.

SCIENTIFIC REPORTS (2022)

Article Multidisciplinary Sciences

Restoring and attributing ancient texts using deep neural networks

Yannis Assael, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipanagiotou, Ion Androutsopoulos, Jonathan Prag, Nando de Freitas

Summary: The study introduces Ithaca, a deep neural network for restoring, attributing, and dating ancient Greek inscriptions. The use of Ithaca significantly improves the accuracy of text restoration and attribution compared to historians working alone, contributing to the study of ancient history.

NATURE (2022)

Article Computer Science, Artificial Intelligence

Diagnostic captioning: a survey

John Pavlopoulos, Vasiliki Kougia, Ion Androutsopoulos, Dimitris Papamichail

Summary: Diagnostic captioning (DC) is the automatic generation of diagnostic text from medical images, assisting physicians in reducing errors and improving efficiency. With the advancements in deep learning, DC has gained attention and resulted in the development of various systems and datasets. This article provides an extensive overview of DC, including relevant datasets, evaluation measures, up-to-date systems, and proposed future directions.

KNOWLEDGE AND INFORMATION SYSTEMS (2022)

Article Computer Science, Artificial Intelligence

Distance from Unimodality for the Assessment of Opinion Polarization

John Pavlopoulos, Aristidis Likas

Summary: Commonsense knowledge is often approximated by the fraction of annotators who classified an item as positive, which overlooks the polarization of opinions. We propose a novel measure, DFU, that estimates the extent of polarization and correlates well with human judgment. Applying DFU to pandemic-related tweets and toxic posts, we find that polarization occurs on different days for different states and is more likely among annotators from different countries. Furthermore, DFU can be used as an objective function to predict the potential for polarized opinions.

COGNITIVE COMPUTATION (2023)

Article Computer Science, Information Systems

NELLIE: Never-Ending Linking for Linked Open Data

Abdullah Fathi Ahmed, Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo

Summary: This paper presents NELLIE, a pipeline architecture to build a chain of modules that address data augmentation challenges and ultimately build a fused knowledge graph out of Linked Open Data. NELLIE uses a two-phase linking approach to fuse each pair of knowledge graphs and improves the link prediction task's Hit@1 score by up to 94.44% compared to a naive approach.

IEEE ACCESS (2023)

Proceedings Paper Computer Science, Artificial Intelligence

Learning Permutation-Invariant Embeddings for Description Logic Concepts

Caglar Demir, Axel-Cyrille Ngonga Ngomo

Summary: Concept learning is the process of learning description logic concepts from background knowledge and input examples. This study proposes a solution to the problem by formulating it as a multi-label classification problem and introducing a neural embedding model (NERO) that predicts F1 scores of selected concepts. By ranking the concepts based on predicted scores, a possible goal concept can be detected without excessive exploration.

ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023 (2023)

Article Management

Machine learning in bank merger prediction: A text-based approach

Apostolos G. Katsafados, George N. Leledakis, Emmanouil G. Pyrgiotakis, Ion Androutsopoulos, Manos Fergadiotis

Summary: This paper investigates the role of textual information in a U.S. bank merger prediction task and finds that using textual information along with financial variables significantly improves the performance of the models, especially in predicting future bidders. These findings highlight the importance of textual information in a bank merger prediction task.

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH (2024)

Proceedings Paper Computer Science, Artificial Intelligence

REBench: Microbenchmarking Framework for Relation Extraction Systems

Manzoor Ali, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo

Summary: In recent years, the development of relation extraction (RE) models has led to the proposal of several benchmark datasets for evaluating these models. However, these datasets do not allow for customized microbenchmarking according to user-specified criteria. This article presents the REBench framework, which enables the selection of customized relation samples from existing datasets in different domains for microbenchmarking. Evaluation of state-of-the-art RE systems using different benchmarking samples demonstrates the importance of specialized microbenchmarking in identifying limitations of RE models and their components.

SEMANTIC WEB - ISWC 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Hashing the Hypertrie: Space- and Time-Efficient Indexing for SPARQL in Tensors

Alexander Biger, Lixi Conrads, Charlotte Behning, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo

Summary: This paper introduces a method to reduce the memory footprint of hypertries, a tensor-based triple store indexing structure, in order to further improve query processing speed in RDF storage solutions. By eliminating duplicate nodes, compressing non-branching paths, and storing single-entry leaf nodes in their parent nodes, significant reductions in memory usage can be achieved.

SEMANTIC WEB - ISWC 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

HybridFC: A Hybrid Fact-Checking Approach for Knowledge Graphs

Umair Qudus, Michael Roeder, Muhammad Saleem, Axel-Cyrille Ngonga Ngomo

Summary: This paper investigates fact-checking approaches for knowledge graphs and introduces five main categories of approaches. Current methods have limitations such as manual feature engineering and exclusive use of knowledge graphs. To improve prediction performance, a hybrid approach is proposed that leverages the diversity of existing approaches within an ensemble learning setting.

SEMANTIC WEB - ISWC 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

MULTPAX: Keyphrase Extraction Using Language Models and Knowledge Graphs

Hamada M. Zahera, Daniel Vollmers, Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo

Summary: This paper proposes a multitask framework, MULTPAX, for keyphrase extraction using pre-trained language models and knowledge graphs. The experiments show that MULTPAX outperforms state-of-the-art baselines significantly.

SEMANTIC WEB - ISWC 2022 (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Learning Concept Lengths Accelerates Concept Learning in ALC

N'Dah Jean Kouagou, Stefan Heindorf, Caglar Demir, Axel-Cyrille Ngonga Ngomo

Summary: This paper investigates concept learning approaches based on refinement operators to address the efficiency issue in exploring solution spaces for complex learning problems. By predicting the length of target concepts, the search space can be pruned during concept learning. Experimental results suggest that recurrent neural network architectures perform the best in predicting concept length. The proposed CLIP algorithm, an extension of the CELOE algorithm, achieves significant improvements in both speed and the F-measure of concept learning.

SEMANTIC WEB, ESWC 2022 (2022)

Proceedings Paper Computer Science, Cybernetics

EvoLearner: Learning Description Logics with Evolutionary Algorithms

Stefan Heindorf, Lukas Blubaum, Nick Dusterhus, Till Werner, Varun Nandkumar Golani, Caglar Demir, Axel-Cyrille Ngonga Ngomo

Summary: This paper proposes an evolutionary approach for learning concepts in knowledge graphs, which improves the initialization of the population and the support for data properties. The approach significantly outperforms existing techniques in structured machine learning tasks.

PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22) (2022)

No Data Available