4.7 Article

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes

期刊

BMC GENOMICS
卷 16, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s12864-015-1333-7

关键词

INDEL; 1000 Genomes Project; Distribution; Mutagenesis

资金

  1. National Human Genome Research Institute [1U01HG005211, 5U54HG003273, 1R01HG008115, R01HG004719, U01HG006513]

向作者/读者索取更多资源

Background: Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. Results: This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. Conclusions: In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据