☆ 4.6 Article

Exposing the cancer genome atlas as a SPARQL endpoint

JOURNAL OF BIOMEDICAL INFORMATICS (2010)

Journal

JOURNAL OF BIOMEDICAL INFORMATICS

Volume 43, Issue 6, Pages 998-1008

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2010.09.004

Keywords

TCGA; SPARQL; RDF; Linked Data; Data integration

Funding

Fundacao para a Ciencia e Tecnologia
Center for Clinical and Translational Sciences [SFRH/BD/45963/2008, IUL1RR024148]
National Heart, Lung and Blood Institute
National Cancer Institute of the US National Institutes of Health [N01-HV-28181, P50 CA70907]
Fundação para a Ciência e a Tecnologia [SFRH/BD/45963/2008] Funding Source: FCT

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language Most SPARQL endpoints, however, cannot easily be queried by data experts Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source (C) 2010 Elsevier Inc All rights reserved.

Exposing the cancer genome atlas as a SPARQL endpoint

Journal

JOURNAL OF BIOMEDICAL INFORMATICS

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Exposing the cancer genome atlas as a SPARQL endpoint

Journal

JOURNAL OF BIOMEDICAL INFORMATICS

Publisher

ACADEMIC PRESS INC ELSEVIER SCIENCE

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper