4.7 Article

SAIGEgds-an efficient statistical tool for large-scale PheWAS with mixed models

向作者/读者索取更多资源

PheWAS studies are powerful tools for discovering and replicating genetic associations, but the computational burden can be reduced using methods like SAIGE. However, analyzing thousands of phenotypes with whole-genome data is still computationally intensive. The new SAIGEgds package offers a faster alternative, especially when used with high-performance computing clusters.
Phenome-wide association studies (PheWASs) are known to be a powerful tool in discovery and replication of genetic association studies. To reduce the computational burden of PheWAS in the large cohorts, such as the UK Biobank, the SAIGE method has been proposed to control for case-control imbalance and sample relatedness in a tractable manner. However, SAIGE is still computationally intensive when deployed in analyzing the associations of thousands of ICD10-coded phenotypes with whole-genome imputed genotype data. Here, we present a new high-performance statistical R package (SAIGEgds) for large-scale PheWAS using generalized linear mixed models. The package implements the SAIGE method in optimized C++ codes, taking advantage of sparse genotype dosages and integrating the efficient genomic data structure file format. Benchmarks using the UK Biobank White British geno-type data (N approximate to 430 K) with coronary heart disease and simulated cases show that the implementation in SAIGEgds is 5-6 times faster than the SAIGE R package. When used in conjunction with high-performance computing clusters, SAIGEgds provides an efficient analysis pipeline for biobank-scale PheWAS.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据