4.7 Article

Gaps and complex structurally variant loci in phased genome assemblies

期刊

GENOME RESEARCH
卷 33, 期 4, 页码 496-510

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
DOI: 10.1101/gr.277334.122

关键词

-

向作者/读者索取更多资源

There has been significant progress in phased genome assembly by combining long-read data with parental information or linked-read data. However, the typical phased genome assembly still has over 140 gaps. A detailed analysis of 182 haploid assemblies reveals that the majority of assembly gaps cluster near large and identical repeats, resulting in disrupted protein-coding genes. Misorientations and alignment discontinuities are also identified, highlighting the need for algorithmic development and pangenome representation.
There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biotechnology & Applied Microbiology

Functional analysis of structural variants in single cells using Strand-seq

Hyobin Jeong, Karen Grimes, Kerstin K. Rauwolf, Peter-Martin Bruch, Tobias Rausch, Patrick Hasenfeld, Eva Benito, Tobias Roider, Radhakrishnan Sabarinathan, David Porubsky, Sophie A. Herbst, Busra Erarslan-Uysal, Johann-Christoph Jann, Tobias Marschall, Daniel Nowak, Jean-Pierre Bourquin, Andreas E. Kulozik, Sascha Dietrich, Beat Bornhauser, Ashley D. Sanders, Jan O. Korbel

Summary: This study introduces a computational method called scNOVA, which utilizes Strand-seq to analyze structural variations in single cells and infer gene expression. The research reveals the impact of structural variations on gene regulation and signaling pathways, and successfully applies the method to the study of chronic lymphocytic leukemia and T cell acute lymphoblastic leukemia.

NATURE BIOTECHNOLOGY (2023)

Article Biotechnology & Applied Microbiology

Pangenome graph construction from genome alignments with Minigraph-Cactus

Glenn Hickey, Jean Monlong, Jana Ebler, Adam M. Novak, Jordan M. Eizenga, Yan Gao, Tobias Marschall, Heng Li, Benedict Paten, Haley J. Abel, Lucinda L. Antonacci-Fulton, Mobin Asri, Gunjan Baid, Carl A. Baker, Anastasiya Belyaeva, Konstantinos Billis, Guillaume Bourque, Silvia Buonaiuto, Andrew Carroll, Mark J. P. Chaisson, Pi-Chuan Chang, Xian H. Chang, Haoyu Cheng, Justin Chu, Sarah Cody, Vincenza Colonna, Daniel E. Cook, Robert M. Cook-Deegan, Omar E. Cornejo, Mark Diekhans, Daniel Doerr, Peter Ebert, Jana Ebler, Evan E. Eichler, Jordan M. Eizenga, Susan Fairley, Olivier Fedrigo, Adam L. Felsenfeld, Xiaowen Feng, Christian Fischer, Paul Flicek, Giulio Formenti, Adam Frankish, Robert S. Fulton, Yan Gao, Shilpa Garg, Erik Garrison, Nanibaa' A. Garrison, Carlos Garcia Giron, Richard E. Green, Cristian Groza, Andrea Guarracino, Leanne Haggerty, Ira M. Hall, William T. Harvey, Marina Haukness, David Haussler, Simon Heumos, Glenn Hickey, Kendra Hoekzema, Thibaut Hourlier, Kerstin Howe, Miten Jain, Erich D. Jarvis, Hanlee P. Ji, Eimear E. Kenny, Barbara A. Koenig, Alexey Kolesnikov, Jan O. Korbel, Jennifer Kordosky, Sergey Koren, HoJoon Lee, Alexandra P. Lewis, Wen-Wei Liao, Shuangjia Lu, Tsung-Yu Lu, Julian K. Lucas, Magalhaes Hugo, Marco-Sola Santiago, Pierre Marijon, Charles Markello, Tobias Marschall, Fergal J. Martin, Ann McCartney, Jennifer McDaniel, Karen H. Miga, Matthew W. Mitchell, Jean Monlong, Jacquelyn Mountcastle, Katherine M. Munson, Moses Njagi Mwaniki, Maria Nattestad, Adam M. Novak, Sergey Nurk, Hugh E. Olsen, Nathan D. Olson, Trevor Pesout, Adam M. Phillippy, Alice B. Popejoy, David Porubsky, Pjotr Prins, Daniela Puiu, Mikko Rautiainen, Allison A. Regier, Arang Rhie, Samuel Sacco, Ashley D. Sanders, Valerie A. Schneider, Baergen Schultz, Kishwar Shafin, Jonas A. Sibbesen, Jouni Siren, Michael W. Smith, Heidi J. Sofia, Ahmad N. Abou Tayoun, Francoise Thibaud-Nissen, Chad Tomlinson, Francesca Floriana Tricomi, Flavia Villani, Mitchell R. Vollger, Justin Wagner, Brian Walenz, Ting Wang, Jonathan M. D. Wood, Aleksey Zimin, Justin M. Zook

Summary: Genome assemblies are used to directly construct genome graphs, which can represent various forms of genetic variation and improve analysis accuracy by overcoming single-reference bias.

NATURE BIOTECHNOLOGY (2023)

Article Biotechnology & Applied Microbiology

Telomere-to-telomere assembly of diploid chromosomes with Verkko

Mikko Rautiainen, Sergey Nurk, Brian P. Walenz, Glennis A. Logsdon, David Porubsky, Arang Rhie, Evan E. Eichler, Adam M. Phillippy, Sergey Koren

Summary: The Telomere-to-Telomere consortium has achieved the first complete sequence of a human genome. They used a combination of long Nanopore sequencing reads and high-resolution assembly graph to resolve repeat sequences and automate the process in their Verkko pipeline. The result is a phased, diploid assembly with many chromosomes assembled from end to end. This advance is crucial for constructing comprehensive pangenome databases and chromosome-scale comparative genomics.

NATURE BIOTECHNOLOGY (2023)

Article Biochemistry & Molecular Biology

Structural Variation Evolution at the 15q11-q13 Disease-Associated Locus

Annalisa Paparella, Alberto L'Abbate, Donato Palmisano, Gerardina Chirico, David Porubsky, Claudia R. Catacchio, Mario Ventura, Evan E. Eichler, Flavia A. M. Maggiolini, Francesca Antonacci

Summary: The impact of segmental duplications on human evolution and disease is significant and requires further research. Comparative analysis of duplication structures in human and nonhuman primates revealed potential genomic drivers for human-specific gene expansions.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2023)

Article Multidisciplinary Sciences

Assembly of 43 human Y chromosomes reveals extensive complexity and variation

Pille Hallast, Peter Ebert, Mark Loftus, Feyza Yilmaz, Peter A. Audano, Glennis A. Logsdon, Marc Jan Bonder, Weichen Zhou, Wolfram Hoeps, Kwondo Kim, Chong Li, Savannah J. Hoyt, Philip C. Dishuck, David Porubsky, Fotios Tsetsos, Jee Young Kwon, Qihui Zhu, Katherine M. Munson, Patrick Hasenfeld, William T. Harvey, Alexandra P. Lewis, Jennifer Kordosky, Kendra Hoekzema, Rachel J. O'Neill, Jan O. Korbel, Chris Tyler-Smith, Evan E. Eichler, Xinghua Shi, Christine R. Beck, Tobias Marschall, Miriam K. Konkel, Charles Lee

Summary: De novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution reveal considerable diversity in the size and structure of the human Y chromosome. The male-specific euchromatic region is subject to large inversions with a higher recurrence rate compared to other chromosomes. Additionally, the study found extensive variation in the repeat arrays of the Y chromosome.

NATURE (2023)

Article Biotechnology & Applied Microbiology

Inversion polymorphism in a complete human genome assembly

David N. Porubsky, William Harvey, Allison Rozanski, Jana Ebler, Wolfram Hoeps, Hufsah Ashraf, Patrick Hasenfeld, Benedict Paten, Ashley D. Sanders, Tobias Marschall, Jan O. Korbel, Evan E. Eichler

Summary: The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. Remapping data from 41 genomes against T2T reference genome showed a 21% increase in sensitivity for mapping inversions compared to the GRCh38 reference. The T2T reference also had a higher likelihood of representing the correct orientation of major human alleles, as shown by identifying 26 misorientations in GRCh38.

GENOME BIOLOGY (2023)

Article Hematology

Focal structural variants revealed by whole genome sequencing disrupt the histone demethylase KDM4C in B-cell lymphomas

Cristina Lopez, Nikolai Schleussner, Stephan H. Bernhart, Kortine Kleinheinz, Stephanie Sungalee, Henrike L. Sczakiel, Helene Kretzmer, Umut H. Toprak, Selina Glaser, Rabea Wagener, Ole Ammerpohl, Susanne Bens, Maciej Giefing, Juan C. Gonzalez Sanchez, Gordana Apic, Daniel Huebschmann, Martin Janz, Markus Kreuz, Anja Mottok, Judith M. Mueller, Julian Seufert, Steve Hoffmann, Jan O. Korbel, Robert B. Russell, Roland Schuele, Lorenz Truemper, Wolfram Klapper, Bernhard Radlwimmer, Peter Lichter, Ralf Kueppers, Matthias Schlesner, Stephan Mathas, Reiner Siebert

Summary: Histone methylation-modifiers, including EZH2 and KMT2D, are frequently altered in B-cell lymphomas. In this study, we examined the whole genome and transcriptome data of 186 cases and identified recurrent alterations in KDM4C, a histone demethylase encoding gene on chromosome 9p24. We demonstrated that these structural variants in KDM4C result in loss-of-function and provide evidence that KDM4C can act as a tumor suppressor. Thus, our findings expand the mutational landscape of lymphomas and highlight the importance of KDM4C in B-cell derived lymphomas.

HAEMATOLOGICA (2023)

暂无数据