期刊
BRIEFINGS IN BIOINFORMATICS
卷 15, 期 6, 页码 906-918出版社
OXFORD UNIV PRESS
DOI: 10.1093/bib/bbt051
关键词
correlation; dependence; network; data analysis
资金
- FAPESP [11/07762-8, 11/50761-2, 13/03447-6, CNPq306319/2010-1, 12/25417-9]
- Pew Latin America fellowship
- JSPS KAKENHI [25830111]
- Grants-in-Aid for Scientific Research [25830111] Funding Source: KAKEN
- Fundacao de Amparo a Pesquisa do Estado de Sao Paulo (FAPESP) [11/07762-8, 13/03447-6, 12/25417-9, 11/50761-2] Funding Source: FAPESP
One major task in molecular biology is to understand the dependency among genes to model gene regulatory networks. Pearson's correlation is the most common method used to measure dependence between gene expression signals, but it works well only when data are linearly associated. For other types of association, such as non-linear or non-functional relationships, methods based on the concepts of rank correlation and information theory-based measures are more adequate than the Pearson's correlation, but are less used in applications, most probably because of a lack of clear guidelines for their use. This work seeks to summarize the main methods (Pearson's, Spearman's and Kendall's correlations; distance correlation; Hoeffding's D measure; Heller-Heller-Gorfine measure; mutual information and maximal information coefficient) used to identify dependency between random variables, especially gene expression data, and also to evaluate the strengths and limitations of each method. Systematic Monte Carlo simulation analyses ranging from sample size, local dependence and linear/non-linear and also non-functional relationships are shown. Moreover, comparisons in actual gene expression data are carried out. Finally, we provide a suggestive list of methods that can be used for each type of data set.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据