4.8 Article

Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis

出版社

NATL ACAD SCIENCES
DOI: 10.1073/pnas.1808403116

关键词

exome; variant; blacklist; WES analysis; WES annotation

资金

  1. Qiagen, Inc.
  2. National Institutes of Health [P01AI061093, U24AI086037, R18AI048693, T32GM007280, R01AI088364, R01AI095983, R01AI127564]
  3. French National Research Agency [ANR 14-CE15-0009-01]
  4. Jeffrey Modell Foundation
  5. David S. Gottesman Immunology Chair
  6. Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai
  7. Agence Nationale de la Recherche (ANR) [ANR-14-CE15-0009] Funding Source: Agence Nationale de la Recherche (ANR)

向作者/读者索取更多资源

Computational analyses of human patient exomes aim to filter out as many nonpathogenic genetic variants (NPVs) as possible, without removing the true disease-causing mutations. This involves comparing the patient's exome with public databases to remove reported variants inconsistent with disease prevalence, mode of inheritance, or clinical penetrance. However, variants frequent in a given exome cohort, but absent or rare in public databases, have also been reported and treated as NPVs, without rigorous exploration. We report the generation of a blacklist of variants frequent within an in-house cohort of 3,104 exomes. This blacklist did not remove known pathogenic mutations from the exomes of 129 patients and decreased the number of NPVs remaining in the 3,104 individual exomes by a median of 62%. We validated this approach by testing three other independent cohorts of 400, 902, and 3,869 exomes. The blacklist generated from any given cohort removed a substantial proportion of NPVs (11-65%). We analyzed the blacklisted variants computationally and experimentally. Most of the blacklisted variants corresponded to false signals generated by incomplete reference genome assembly, location in low-complexity regions, bioinformatic misprocessing, or limitations inherent to cohort-specific private alleles (e.g., due to sequencing kits, and genetic ancestries). Finally, we provide our precalculated blacklists, together with ReFiNE, a program for generating customized blacklists from any medium-sized or large in-house cohort of exome (or other next-generation sequencing) data via a user-friendly public web server. This work demonstrates the power of extracting variant blacklists from private databases as a specific in-house but broadly applicable tool for optimizing exome analysis.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据