☆ 4.6 Article

Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality

PLOS ONE (2012)

期刊

PLOS ONE

卷 7, 期 10, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pone.0046771

关键词

类别

Multidisciplinary Sciences

资金

National Library of Medicine at the United States National Institutes of Health [1K99LM010822-01]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.6

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Overlap in observational studies with high-dimensional covariates

Alexander D'Amour, Peng Ding, Avi Feller, Lihua Lei, Jasjeet Sekhon

Summary: This paper discusses the key assumptions for estimating causal effects under exogeneity, including unconfoundedness and overlap. Researchers often argue that unconfoundedness is more plausible when more covariates are included in the analysis, while less discussed is the difficulty of satisfying covariate overlap. By exploiting results from information theory, the authors derive explicit bounds on the average imbalance in covariate means under strict overlap, showing that these bounds become more restrictive as the dimension grows large.

JOURNAL OF ECONOMETRICS (2021)