期刊
JOURNAL OF BIOMEDICAL INFORMATICS
卷 45, 期 2, 页码 363-371出版社
ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jbi.2011.11.017
关键词
Ontology terms; Similarity measure; Disease similarity; Semantic similarity; Gene Ontology; Ontology perturbation; Ontology based disease similarity
Genomics has contributed to a growing collection of gene-function and gene-disease annotations that can be exploited by informatics to study similarity between diseases. This can yield insight into disease etiology, reveal common pathophysiology and/or suggest treatment that can be appropriated from one disease to another. Estimating disease similarity solely on the basis of shared genes can be misleading as variable combinations of genes may be associated with similar diseases, especially for complex diseases. This deficiency can be potentially overcome by looking for common biological processes rather than only explicit gene matches between diseases. The use of semantic similarity between biological processes to estimate disease similarity could enhance the identification and characterization of disease similarity. We present functions to measure similarity between terms in an ontology, and between entities annotated with terms drawn from the ontology, based on both co-occurrence and information content. The similarity measure is shown to outperform other measures used to detect similarity. A manually curated dataset with known disease similarities was used as a benchmark to compare the estimation of disease similarity based on gene-based and Gene Ontology (GO) process-based comparisons. The detection of disease similarity based on semantic similarity between GO Processes (Recall = 55%, Precision = 60%) performed better than using exact matches between GO Processes (Recall = 29%, Precision = 58%) or gene overlap (Recall = 88% and Precision = 16%). The GO-Process based disease similarity scores on an external test set show statistically significant Pearson correlation (0.73) with numeric scores provided by medical residents. GO-Processes associated with similar diseases were found to be significantly regulated in gene expression microarray datasets of related diseases. (C) 2011 Elsevier Inc. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据