4.7 Article Data Paper

A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0)

期刊

GIGASCIENCE
卷 6, 期 11, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/gigascience/gix098

关键词

chimpanzee reference genome; assembly; genomics

资金

  1. RED-BIO project of the Spanish National Bioinformatics Institute (INB) [PT13/0001/0044]
  2. Spanish National Health Institute Carlos III (ISCIII)
  3. Spanish Ministry of Economy and Competitiveness (MINECO)
  4. FPI fellowship [BFU2014-55090-P]
  5. Swedish Foundation for Strategic Research [F06-0045]
  6. Swedish Research Council
  7. US National Institutes of Health (NIH) [DA033660, HG006696, HD073731, MH097018]
  8. March of Dimes [6-FY13-92]
  9. NIH [R01HG002385, U24HG009081, HG007990, HG007234]
  10. MINECO [BFU2014-55090-P, BFU2015-7116-ERC, BFU2015-6215-ERC]
  11. Fundacio Zoo Barcelona and Secretaria d'Universitats i Recerca del Departament d'Economia i Coneixement de la Generalitat de Catalunya

向作者/读者索取更多资源

The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan tro 2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan tro 2.1.4) by several metrics, such as increased contiguity by > 750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan tro 2.1.4 assembly gaps spanning > 850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan tro 2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据