4.3 Article

BGData-A Suite of R Packages for Genomic Analysis with Big Data

期刊

G3-GENES GENOMES GENETICS
卷 9, 期 5, 页码 1377-1383

出版社

GENETICS SOCIETY AMERICA
DOI: 10.1534/g3.119.400018

关键词

big data; parallel computing; distributed computing; genetic analyses; biobank

资金

  1. NIH [R01GM101219]
  2. Michigan State University

向作者/读者索取更多资源

We created a suite of packages to enable analysis of extremely large genomic data sets (potentially millions of individuals and millions of molecular markers) within the R environment. The package offers: a matrix-like interface for .bed files (PLINK's binary format for genotype data), a novel class of linked arrays that allows linking data stored in multiple files to form a single array accessible from the R computing environment, methods for parallel computing capabilities that can carry out computations on very large data sets without loading the entire data into memory and a basic set of methods for statistical genetic analyses. The package is accessible through CRAN and GitHub. In this note, we describe the classes and methods implemented in each of the packages that make the suite and illustrate the use of the packages using data from the UK Biobank.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据