4.7 Article

A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2008.38

关键词

Machine learning; parameter learning; time-series analysis; feature extraction or construction; clustering; classification; association rules

资金

  1. Natural Sciences and Engineering Research Council of Canada
  2. US National Science Foundation

向作者/读者索取更多资源

A key problem in molecular biology is to infer regulatory relationships between genes from expression data. This paper studies a simplified model of such inference problems in which one or more Boolean variables, modeling, for example, the expression levels of genes, each depend deterministically on a small but unknown subset of a large number of Boolean input variables. Our model assumes that the expression data comprises a time series, in which successive samples may be correlated. We provide bounds on the expected amount of data needed to infer the correct relationships between output and input variables. These bounds improve and generalize previous results for Boolean network inference and continuous-time switching network inference. Although the computational problem is intractable in general, we describe a fixed-parameter tractable algorithm that is guaranteed to provide at least a partial solution to the problem. Most interestingly, both the sample complexity and computational complexity of the problem depend on the strength of correlations between successive samples in the time series but in opposing ways. Uncorrelated samples minimize the total number of samples needed while maximizing computational complexity; a strong correlation between successive samples has the opposite effect. This observation has implications for the design of experiments for measuring gene expression.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据