4.7 Article

Exome variant discrepancies due to reference-genome differences

期刊

AMERICAN JOURNAL OF HUMAN GENETICS
卷 108, 期 7, 页码 1239-1250

出版社

CELL PRESS
DOI: 10.1016/j.ajhg.2021.05.011

关键词

-

资金

  1. National Human Genome Research Institute (NHGRI)/National Heart, Lung, and Blood Institute (NHLBI) [UM1 HG006542]
  2. NHGRI [U54HG003273]
  3. U.S. National Institute of Neurological Disorders and Stroke (NINDS) [R35NS105078]
  4. 2020 XiaGibbs Society Research Grant
  5. [K08 HG008986]

向作者/读者索取更多资源

This study evaluated the impact of using different reference assemblies on the identification of variants associated with rare and common diseases from large-scale exome-sequencing data. The results showed that using different references led to approximately 1.5% discordance in SNVs and 2.0% discordance in indels. The discordant variants were mainly clustered within discrete discordant reference patches enriched for specific genomic elements.
Despite release of the GRCh38 human reference genome more than seven years ago, GRCh37 remains more widely used by most research and clinical laboratories. To date, no study has quantified the impact of utilizing different reference assemblies for the identification of variants associated with rare and common diseases from large-scale exome-sequencing data. By calling variants on both the GRCh37 and GRCh38 references, we identified single-nucleotide variants (SNVs) and insertion-deletions (indels) in 1,572 exomes from participants with Mendelian diseases and their family members. We found that a total of 1.5% of SNVs and 2.0% of indels were discordant when different references were used. Notably, 76.6% of the discordant variants were clustered within discrete discordant reference patches (DISCREPs) comprising only 0.9% of loci targeted by exome sequencing. These DISCREPs were enriched for genomic elements including segmental duplications, fix patch sequences, and loci known to contain alternate haplotypes. We identified 206 genes significantly enriched for discordant variants, most of which were in DISCREPs and caused by multi-mapped reads on the reference assembly that lacked the variant call. Among these 206 genes, eight are implicated in known Mendelian diseases and 53 are associated with common phenotypes from genome-wide association studies. In addition, variant interpretations could also be influenced by the reference after lifting-over variant loci to another assembly. Overall, we identified genes and genomic loci affected by reference assembly choice, including genes associated with Mendelian disorders and complex human diseases that require careful evaluation in both research and clinical applications.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据