☆ 4.3 Review

Gene-gene interaction: the curse of dimensionality

ANNALS OF TRANSLATIONAL MEDICINE (2019)

期刊

ANNALS OF TRANSLATIONAL MEDICINE

卷 7, 期 24, 页码 -

出版社

AME PUBLISHING COMPANY

DOI: 10.21037/atm.2019.12.87

关键词

Gene-gene interaction; parallel computing; PySpark; deep-learning (DL); machine-learning (ML); multifactor dimensionality reduction (MDR)

类别

Oncology Medicine, Research & Experimental

资金

Ministry of Science and Technology, Taiwan [MOST-106-2314-B-002-134-MY2, MOST-104-2314-B-002-107-MY2]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Identified genetic variants from genome wide association studies frequently show only modest effects on the disease risk, leading to the missing heritability problem. An avenue, to account for a part of this missingness is to evaluate gene-gene interactions (epistasis) thereby elucidating their effect on complex diseases. This can potentially help with identifying gene functions, pathways, and drug targets. However, the exhaustive evaluation of all possible genetic interactions among millions of single nucleotide polymorphisms (SNPs) raises several issues, otherwise known as the curse of dimensionality. The dimensionality involved in the epistatic analysis of such exponentially growing SNPs diminishes the usefulness of traditional, parametric statistical methods. With the immense popularity of multifactor dimensionality reduction (MDR), a non-parametric method, proposed in 2001, that classifies multi-dimensional genotypes into one-dimensional binary approaches, led to the emergence of a fast-growing collection of methods that were based on the MDR approach. Moreover, machine-learning (ML) methods such as random forests and neural networks (NNs), deep-learning (DL) approaches, and hybrid approaches have also been applied profusely, in the recent years, to tackle this dimensionality issue associated with whole genome gene-gene interaction studies. However, exhaustive searching in MDR based approaches or variable selection in ML methods, still pose the risk of missing out on relevant SNPs. Furthermore, interpretability issues are a major hindrance for DL methods. To minimize this loss of information, Python based tools such as PySpark can potentially take advantage of distributed computing resources in the cloud, to bring back smaller subsets of data for further local analysis. Parallel computing can be a powerful resource that stands to fight this curse. PySpark supports all standard Python libraries and C extensions thus making it convenient to write codes to deliver dramatic improvements in processing speed for extraordinarily large sets of data.

Gene-gene interaction: the curse of dimensionality

期刊

ANNALS OF TRANSLATIONAL MEDICINE

出版社

AME PUBLISHING COMPANY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Gene-gene interaction: the curse of dimensionality

期刊

ANNALS OF TRANSLATIONAL MEDICINE

出版社

AME PUBLISHING COMPANY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文