4.5 Article

EMINIM: An Adaptive and Memory-Efficient Algorithm for Genotype Imputation

期刊

JOURNAL OF COMPUTATIONAL BIOLOGY
卷 17, 期 3, 页码 547-560

出版社

MARY ANN LIEBERT, INC
DOI: 10.1089/cmb.2009.0199

关键词

genetic variation; genetics; genomics; statistics

资金

  1. National Science Foundation [0513612, 0731455, 0729049]
  2. National Institutes of Health [1K25HL080079]
  3. Microsoft Research Fellowship
  4. Samsung Scholarship
  5. National Toxicology Program/National Institute of Environmental Health Sciences [N01-ES-45530]
  6. Direct For Computer & Info Scie & Enginr
  7. Division of Computing and Communication Foundations [0729049] Funding Source: National Science Foundation
  8. Div Of Information & Intelligent Systems
  9. Direct For Computer & Info Scie & Enginr [0513612, 0731455] Funding Source: National Science Foundation

向作者/读者索取更多资源

Genome-wide association studies have proven to be a highly successful method for identification of genetic loci for complex phenotypes in both humans and model organisms. These large scale studies rely on the collection of hundreds of thousands of single nucleotide polymorphisms (SNPs) across the genome. Standard high-throughput genotyping technologies capture only a fraction of the total genetic variation. Recent efforts have shown that it is possible to impute with high accuracy the genotypes of SNPs that are not collected in the study provided that they are present in a reference data set which contains both SNPs collected in the study as well as other SNPs. We here introduce a novel HMM based technique to solve the imputation problem that addresses several shortcomings of existing methods. First, our method is adaptive which lets it estimate population genetic parameters from the data and be applied to model organisms that have very different evolutionary histories. Compared to previous methods, our method is up to ten times more accurate on model organisms such as mouse. Second, our algorithm scales in memory usage in the number of collected markers as opposed to the number of known SNPs. This issue is very relevant due to the size of the reference data sets currently being generated. We compare our method over mouse and human data sets to existing methods, and show that each has either comparable or better performance and much lower memory usage. The method is available for download at http://genetics.cs.ucla.edu/eminim.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据