4.1 Article

BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data

期刊

JOURNAL OF BIOMEDICAL SEMANTICS
卷 5, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/2041-1480-5-32

关键词

-

资金

  1. National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency (JST)
  2. Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI
  3. National Institutes of Health (NIH) [4U41HG006104-04]

向作者/读者索取更多资源

Background: Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. Results: We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Conclusions: Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Information Systems

OMDP: An ontology-based model for diagnosis and treatment of diabetes patients in remote healthcare systems

Li Chen, Dongxin Lu, Menghao Zhu, Muhammad Muzammal, Oluwarotimi Williams Samuel, Guixin Huang, Weinan Li, Hongyan Wu

INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS (2019)

Article Multidisciplinary Sciences

Split4Blank: Maintaining consistency while improving efficiency of loading RDF data with blank nodes

Atsuko Yamaguchi, Yasunori Yamamoto

PLOS ONE (2019)

Article Biochemical Research Methods

Enzyme annotation in UniProtKB using Rhea

Anne Morgat, Thierry Lombardot, Elisabeth Coudert, Kristian Axelsen, Teresa Batista Neto, Sebastien Gehant, Parit Bansal, Jerven Bolleman, Elisabeth Gasteiger, Edouard de Castro, Delphine Baratin, Monica Pozzato, Ioannis Xenarios, Sylvain Poux, Nicole Redaschi, Alan Bridge

BIOINFORMATICS (2020)

Article Biology

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes

Jerven Bolleman, Edouard de Castro, Delphine Baratin, Sebastien Gehant, Beatrice A. Cuche, Andrea H. Auchincloss, Elisabeth Coudert, Chantal Hulo, Patrick Masson, Ivo Pedruzzi, Catherine Rivoire, Ioannis Xenarios, Nicole Redaschi, Alan Bridge

GIGASCIENCE (2020)

Article Computer Science, Artificial Intelligence

Cascade architecture with rhetoric long short-term memory for complex sentence sentiment analysis

Chaojie Ji, Hongyan Wu

NEUROCOMPUTING (2020)

Article Biotechnology & Applied Microbiology

Heterogeneous Graph Convolutional Networks and Matrix Completion for miRNA-Disease Association Prediction

Rongxiang Zhu, Chaojie Ji, Yingying Wang, Yunpeng Cai, Hongyan Wu

FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY (2020)

Article Biochemistry & Molecular Biology

Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB

Marc Feuermann, Emmanuel Boutet, Anne Morgat, Kristian B. Axelsen, Parit Bansal, Jerven Bolleman, Edouard de Castro, Elisabeth Coudert, Elisabeth Gasteiger, Sebastien Gehant, Damien Lieberherr, Thierry Lombardot, Teresa B. Neto, Ivo Pedruzzi, Sylvain Poux, Monica Pozzato, Nicole Redaschi, Alan Bridge

Summary: UniProtKB is a comprehensive and freely accessible resource that covers natural products from various plants and microorganisms, users can search protein knowledge relevant to natural products through interactive or programmatic queries, and enrich other natural product datasets and databases by mining UniProtKB data.

METABOLITES (2021)

Article Genetics & Heredity

Advances in the development of PubCaseFinder, including the new application programming interface and matching algorithm

Toyofumi Fujiwara, Jae-Moon Shin, Atsuko Yamaguchi

Summary: This paper describes notable updates regarding PubCaseFinder, the GeneYenta matching algorithm, and the PubCaseFinder API. The updated PubCaseFinder and new API empower patient repositories and medical professionals to actively use HPO-based resources.

HUMAN MUTATION (2022)

Article Computer Science, Artificial Intelligence

Perturb more, trap more: Understanding behaviors of graph neural networks

Chaojie Ji, Ruxin Wang, Hongyan Wu

Summary: This paper proposes a novel post hoc framework called TraP2, which is based on local fidelity and can generate high-fidelity explanations for any trained GNNs. By incorporating translation, perturbation, and paraphrase layers, TraP2 can effectively highlight the relevant graph structure and important features inside each node, leading to highly faithful explanations.

NEUROCOMPUTING (2022)

Article Biochemical Research Methods

AdaPPI: identification of novel protein functional modules via adaptive graph convolution networks in a protein-protein interaction network

Hongwei Chen, Yunpeng Cai, Chaojie Ji, Gurudeeban Selvaraj, Dongqing Wei, Hongyan Wu

Summary: We propose an adaptive convolution graph network, AdaPPI, to predict protein functional modules in protein-protein interaction networks. By integrating protein gene ontology attributes and network topology, our framework outperforms existing methods in finding functional modules.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Computer Science, Artificial Intelligence

Graph Polish: A Novel Graph Generation Paradigm for Molecular Optimization

Chaojie Ji, Yijia Zheng, Ruxin Wang, Yunpeng Cai, Hongyan Wu

Summary: In this study, a novel molecular optimization paradigm called Graph Polish is proposed. It predicts the optimization center and optimizes the surrounding regions to achieve molecular optimization. An effective learning framework called Teacher and Student Polish captures the dependencies in the optimization steps. Experimental results show that the proposed approach outperforms state-of-the-art methods in multiple optimization tasks and has good explainability and time savings.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

Article Computer Science, Artificial Intelligence

Predicting unseen antibodies' neutralizability via adaptive graph neural networks

Jie Zhang, Yishan Du, Pengfei Zhou, Jinru Ding, Shuai Xia, Qian Wang, Feiyang Chen, Mu Zhou, Xuemei Zhang, Weifeng Wang, Hongyan Wu, Lu Lu, Shaoting Zhang

Summary: The study proposes a graph-based method, DeepAAI, for predicting neutralization activity of antibodies and applies it to recommend probable antibodies for human immunodeficiency virus, severe acute respiratory syndrome coronavirus 2, influenza, and dengue. DeepAAI learns dynamic representations and relation graphs to optimize downstream tasks such as neutralization prediction and concentration estimation. The method demonstrates good performance and rich interpretability, suggesting potential broad-spectrum antibodies against new viral variants.

NATURE MACHINE INTELLIGENCE (2022)

Article Mathematical & Computational Biology

SwissBioPics - an interactive library of cell images for the visualization of subcellular location data

Philippe Le Mercier, Jerven Bolleman, Edouard de Castro, Elisabeth Gasteiger, Parit Bansal, Andrea H. Auchincloss, Emmanuel Boutet, Lionel Breuza, Cristina Casals-Casas, Anne Estreicher, Marc Feuermann, Damien Lieberherr, Catherine Rivoire, Ivo Pedruzzi, Nicole Redaschi, Alan Bridge

Summary: SwissBioPics is a freely accessible resource that provides interactive, high-resolution cell images for visualizing subcellular location data. The images cover various cell types from different kingdoms of life and are tagged with unique identifiers from the controlled vocabulary of UniProt. Users can search and explore the cell images through the website and embed them in their own websites using the provided web component. SwissBioPics is also used by UniProt to visualize the subcellular locations and organelles where proteins function.

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION (2022)

Proceedings Paper Computer Science, Artificial Intelligence

Investigating Schema Definitions Using RDFS and OWL 2 for RDF Databases in Life Sciences

Atsuko Yamaguchi, Tatsuya Kushida, Yasunori Yamamoto, Kouji Kozaki

SEMANTIC TECHNOLOGY, JIST 2019 (2020)

Proceedings Paper Computer Science, Artificial Intelligence

Fully Convolutional Network based on Contrast Information Integration for Dermoscopic Image Segmentation

Shuyuan Chen, Chaojie Ji, Ruxin Wang, Hongyan Wu

2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020) (2020)

暂无数据