4.5 Article

Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies

期刊

EUROPEAN JOURNAL OF HUMAN GENETICS
卷 22, 期 2, 页码 254-259

出版社

SPRINGERNATURE
DOI: 10.1038/ejhg.2013.109

关键词

genome-wide association; statistical learning; random forest; genetic algorithms; epistasis; interactions

资金

  1. NIH [HL091028, HL071782, DA012854, DA027995]

向作者/读者索取更多资源

Genome-wide association studies (GWAS) has brought methodological challenges in handling massive high-dimensional data and also real opportunities for studying the joint effect of many risk factors acting in concert as an organic group. The random forest (RE) methodology is recognized by many for its potential in examining interaction effects in large data sets. However, RE is not designed to directly handle GWAS data, which typically have hundreds of thousands of single-nucleotide polymorphisms as predictor variables. We propose and evaluate a novel extension of RF, called random forest fishing (REF), for GWAS analysis. REF repeatedly updates a relatively small set of predictors obtained by RE tests to find globally important groups predictive of the disease phenotype, using a novel search algorithm based on genetic programming and simulated annealing. A key improvement of RFF results from the use of guidance incorporating empirical test results of genome-wide pairwise interactions. Evaluated using simulated and real GWAS data sets, RFF is shown to be effective in identifying important predictors, particularly when both marginal effects and interactions exist, and is applicable to very large GWAS data sets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据