期刊
BIOINFORMATICS
卷 37, 期 5, 页码 728-730出版社
OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btaa731
关键词
-
类别
资金
- AbbVie
PheWAS studies are powerful tools for discovering and replicating genetic associations, but the computational burden can be reduced using methods like SAIGE. However, analyzing thousands of phenotypes with whole-genome data is still computationally intensive. The new SAIGEgds package offers a faster alternative, especially when used with high-performance computing clusters.
Phenome-wide association studies (PheWASs) are known to be a powerful tool in discovery and replication of genetic association studies. To reduce the computational burden of PheWAS in the large cohorts, such as the UK Biobank, the SAIGE method has been proposed to control for case-control imbalance and sample relatedness in a tractable manner. However, SAIGE is still computationally intensive when deployed in analyzing the associations of thousands of ICD10-coded phenotypes with whole-genome imputed genotype data. Here, we present a new high-performance statistical R package (SAIGEgds) for large-scale PheWAS using generalized linear mixed models. The package implements the SAIGE method in optimized C++ codes, taking advantage of sparse genotype dosages and integrating the efficient genomic data structure file format. Benchmarks using the UK Biobank White British geno-type data (N approximate to 430 K) with coronary heart disease and simulated cases show that the implementation in SAIGEgds is 5-6 times faster than the SAIGE R package. When used in conjunction with high-performance computing clusters, SAIGEgds provides an efficient analysis pipeline for biobank-scale PheWAS.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据