4.6 Article

The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies

期刊

PLOS COMPUTATIONAL BIOLOGY
卷 16, 期 6, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1007981

关键词

-

资金

  1. USDA National Institute of Food and Agriculture [2018-67015-28199]
  2. National Science Foundation [IOS-1744309]
  3. National Institutes of Health [R01-HG006677, R35-GM130151]

向作者/读者索取更多资源

The introduction of third-generation DNA sequencing technologies in recent years has allowed scientists to generate dramatically longer sequence reads, which when used in whole-genome sequencing projects have yielded better repeat resolution and far more contiguous genome assemblies. While the promise of better contiguity has held true, the relatively high error rate of long reads, averaging 8-15%, has made it challenging to generate a highly accurate final sequence. Current long-read sequencing technologies display a tendency toward systematic errors, in particular in homopolymer regions, which present additional challenges. A cost-effective strategy to generate highly contiguous assemblies with a very low overall error rate is to combine long reads with low-cost short-read data, which currently have an error rate below 0.5%. This hybrid strategy can be pursued either by incorporating the short-read data into the early phase of assembly, during the read correction step, or by using short reads to polish the consensus built from long reads. In this report, we present the assembly polishing tool POLCA (POLishing by Calling Alternatives) and compare its performance with two other popular polishing programs, Pilon and Racon. We show that on simulated data POLCA is more accurate than Pilon, and comparable in accuracy to Racon. On real data, all three programs show similar performance, but POLCA is consistently much faster than either of the other polishing programs.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemical Research Methods

Liftoff: accurate mapping of gene annotations

Alaina Shumate, Steven L. Salzberg

Summary: Advancements in DNA sequencing and computational methods have led to a significant increase in high-quality genome assemblies for many species. To annotate gene features in these genomes, a common strategy is to map genes from a previously annotated reference genome to new or improved assemblies. The tool Liftoff can accurately map genes between the same or closely related species, ensuring high sequence identity and preserving gene structure.

BIOINFORMATICS (2021)

Article Biochemistry & Molecular Biology

Effects of transcriptional noise on estimates of gene and transcript expression in RNA sequencing experiments

Ales Varabyou, Steven L. Salzberg, Mihaela Pertea

Summary: RNA sequencing is commonly used to study gene expression, but simulations typically do not consider the impact of transcriptional noise. This study found that noise leads to systematic errors in computational methods, resulting in underestimation of transcript abundance and increased false-positive genes. Alignment-free methods may also struggle to detect transcripts expressed at low levels.

GENOME RESEARCH (2021)

Article Genetics & Heredity

Dissecting the Polygenic Basis of Cold Adaptation Using Genome-Wide Association of Traits and Environmental Data in Douglas-fir

Amanda R. De La Torre, Benjamin Wilhite, Daniela Puiu, John Bradley St. Clair, Marc W. Crepeau, Steven L. Salzberg, Charles H. Langley, Brian Allen, David B. Neale

Summary: Understanding the genomic and environmental basis of cold adaptation in Douglas-fir is crucial, with results indicating a complex genetic architecture involving both polygenic traits and large/small effect genes. Newly discovered associations for cold adaptation involve genes related to various biological functions, highlighting the interplay between genetics and environmental factors in cold-associated trait variation.
Article Biochemical Research Methods

Balrog: A universal protein model for prokaryotic gene prediction

Markus J. Sommer, Steven L. Salzberg

Summary: Researchers have developed a universal model of prokaryotic genes based on amino acid sequences from a large and diverse set of microbial genomes, which has been incorporated into a gene finding system called Balrog. This system does not require genome-specific training and performs as well as or better than other state-of-the-art gene finding tools.

PLOS COMPUTATIONAL BIOLOGY (2021)

Article Multidisciplinary Sciences

Chromosome Xq23 is associated with lower atherogenic lipid concentrations and favorable cardiometabolic indices

Pradeep Natarajan, Akhil Pampana, Sarah E. Graham, Sanni E. Ruotsalainen, James A. Perry, Paul S. de Vries, Jai G. Broome, James P. Pirruccello, Michael C. Honigbere, Krishna Aragam, Brooke Wolford, Jennifer A. Brody, Lucinda Antonacci-Fulton, Moscati Arden, Stella Aslibekyan, Themistocles L. Assimes, Christie M. Ballantyne, Lawrence F. Bielak, Joshua C. Bisl, Brian E. Cade, Ron Do, Harsha Doddapaneni, Leslie S. Emery, Yi-Jen Hung, Marguerite R. Irvin, Alyna T. Khan, Leslie Lange, Jiwon Lee, Rozenn N. Lemaitre, Lisa W. Martin, Ginger Metcalf, May E. Montasser, Jee-Young Moon, Donna Muzny, Jeffrey R. O. Connell, Nicholette D. Palmer, Juan M. Peralta, Patricia A. Peyser, Adrienne M. Stilp, Michael Tsai, Fei Fei Wang, Daniel E. Weeks, Lisa R. Yanek, James G. Wilson, Goncalo Abecasis, Donna K. Arnett, Lewis C. Becker, John Blangercy, Eric Boerwinkle, Donald W. Bowden, Yi-Cheng Chang, Yii-Der Chen, Won Jung Choi, Adolfo Correa, Joanne E. Curran, Mark J. Daly, Susan K. DutcherE, Patrick T. Ellinor, Myriam Fornage, Barry Freedman, Stacey Gabriel, Soren Germer, Richard A. Gibbs, Jiang He, Kristian Hveem, Gail P. Jarvik, Robert C. Kaplan, Sharon L. R. Kardia, Eimear Kennyn, Ryan W. Kim, Charles Kooperberg, Cathy C. Laurie, Seonwook Lee, Don M. Lloyd-Jones, Ruth J. F. Loos, Steven A. Lubitz, Rasika A. Mathias, Karine A. Viaud Martinez, Stephen T. McGarvey, Braxton D. Mitche, Deborah A. Nickerson, Kari E. North, Aarno Palotie, Cheol Joo Park, Bruce M. Y. Psat, D. C. Rao, Susan Redline, Alexander P. Reiner, Daekwan Seo, Jeong-Sun Seo, Albert Smith, Russell P. Tracy, Sekar Kathiresan, L. Adrienne Cupples, Jerome Rotten, Alanna C. Morrison, Stephen S. Rich, Samuli Ripatti, Cristen Wilier, Gina M. Peloso, Ramachandran S. Vasan, Namiko Abe, Christine Albert, Laura Almasy, Alvaro Alonso, Seth Ament, Peter Anderson, Deborah Applebaum-Bowden, Dan Arking, Allison Ashley-Koch, Paul Auer, Dimitrios Avramopoulos, John Barnard, Kathleen Barnes, R. Graham Barr, Emily Barron-Casella, Terri Beaty, Diane Becker, Rebecca Beer, Ferdouse Begum, Amber Beitelshees, Emelia Benjamin, Marcos Bezerra, Larry Bielak, Thomas Blackwel, Russell Bowler, Ulrich Broecke, Karen Bunting, Esteban Burchard, Erin Buth, Jonathan Cardwel, Cara Carty, Richard Casaburi, James Casella, Mark Chaffin, Christy Chang, Daniel Chasman, Sameer Chavan, Bo-Juen Chen, Wei-Min Chen, Michael Chol, Seung Hoan Choi, Lee-Ming Chuang, Mina Chung, Matthew P. Conomos, Elaine Cornell, James Crapo, Jeffrey Curtis, Brian Custer, Coleen Damcott, Dawood Darbar, Sayantan Das, Sean David, Colleen Davis, Michelle Daya, Mariza de Andrade, Michael DeBaunuo, Qing Duan, Ranjan Deka Dawn DeMeo Scott Devine, Qing Ravi Duggirala, Jon Peter Durda, Susan Dutcher, Charles Eaton, Lynette Ekunwe, Charles Farber, Leanna Farnaml, Tasha Fingerlin, Matthew Flickinger, Nora Franceschini, Mao Fu, Stephanie M. Fullerton, Lucinda Fulton, Weiniu Gan, Yan Gao, Margery Gass, Bruce Ge, Xiaoqi Priscilla Geng, Soren Germer, Chris Gignoux, Mark Gladwin, David Glahn, Stephanie Gogarten, Da-Wei Gong, Harald Goring, C. Charles Gu, Yue Guan, Xiuqing Guo, Jeff Haessler, Michael Hall, Daniel Harris, Nicola Y. Hawle, Ben Heavner, Susan Heckbert, Ryan Hernandez, David Herrington, Craig Hersh, Bertha Hidalgo, James Hixson, John Hokanson, Elliott Hong, Karin Hoth, Chao Agnes Hsiung, Haley Huston, Chii Min Hwu, Rebecca Jackson, Deepti Jain, Cashell Jaquish, Min A. Jhun, Jill Johnsen, Andrew Johnson, Craig Johnson, Rich Johnston, Kimberly Jones, Hyun Min Kang, Laura Kaufman, Shannon Y. Kell, Michael Kessler, Greg Kinney, Barbara Konkle, Holly Kramer, Stephanie Krauter, Christoph Lange, Ethan Lange, Cecelia Laurie, Meryl LeBoff, Seunggeun Shawn Lee, Wen-Jane Lee, Jonathon LeFaive, David Levine, Dan Levy, Joshua Lewis, Yun Li, Honghuang Lin, Keng Han Lin, Xihong Lin, Simin Liu, Yongmei Liu, Kathryn Lunetta, James Luo, Michael Mahaney, Barry Make, Ani Manichaikul, JoAnn Mansonl, Lauren Margolin, Susan Mathai, Patrick McArdle, Merry-Lynn Mcdonald, Sean McFarland, Caitlin McHugh, Hao Mei, Deborah A. Meyers, Julie Mikulla, Nancy Min, Mollie Minear, Ryan L. Minster, Solomon Musani, Stanford Mwasongwe, Josyf C. Mychaleckyj, Girish Nadkarni, Rakhi Naik, Take Naseri, Sergei Nekhai, Sarah C. Nelson, Deborah Nickerson, Jeff O. Connell, Tim O. Connor, Heather Ochs-Balcom, James Pankow, George Papanicolaou, Margaret Parkerl, Afshin Parsa, Sara Penchey, Marco Perez, Ulrike Peters, Lawrence S. Phillips, Sam Phillips, Toni Pollin, Wendy Post, Julia Powers Becker, Meher Preethi Boorgula, Michael Preuss, Dmitry Prokopenko, Pankaj Qasba, Dandi Qiao, Nicholas Rafaels, Laura Raffield, Laura Rasmussen-Torvik, Aakrosh Ratan, Robert Reed, Elizabeth Reganl, Muagututi Sefuiva Reupena, Ken Rice, Dan Roden, Carolina Roselli, Ingo Ruczinski, Pamela Russel, Sarah Ruuska, Kathleen Ryan, Ester Cerdeira Sabino, Phuwanat Sakornsakolpatl, Steven Salzberg, Kevin Sandow, Vijay G. Sankaran, Christopher Scheller, Ellen Schmidt, Karen Schwander, David Schwartz, Frank Sciurba, Christine Seidman, Jonathan Seidman, Vivien Sheehan, Amol Shetty, Aniket Shetty, Wayne Hui-Heng Sheu, M. Benjamin Shoemaker, Brian Silver, Edwin Silvermanl, Jennifer Smith, Josh Smith, Nicholas Smith, Tanja Smith, Sylvia Smoller, Beverly Snively, Tamar Soferlm, Elizabeth Streeten, Jessica Lasky Su, Yun Ju Sung, Jody Sylvia, Carole Sztalryd, Daniel Taliun, Hua Tang, Margaret Taub, Kent D. Taylor, Simeon Taylor, Marilyn Telen, Timothy A. Thornton, Lesley Tinker, David Tirschwel, Hemant Tiwari, Dhananjay Vaidya, Peter VandeHaar, Scott Vrieze, Tarik Walker, Robert Wallace, Avram Waits, Emily Wan, Heming Wang, Karol Watson, Bruce Weir, Scott Weiss, Lu-Chen Weng, Kayleen Williams, L. Keoki Williams, Carla Wilson, Quenna Wong, Huichun Xu, Ivana Yang, Rongze Yang, Norann Zaghlou, Maryam Zekavat, Yingze Zhang, Snow Xueyan Zhao, Wei Zhao, Degui Zni, Xiang Zhou, Xiaofeng Zhu, Michael Zody, Sebastian Zoellner, Aarno Palotie, Mark Daly, Howard Jacob, Athena Matakidou, Heiko Runz, Sally John, Robert Plenge, Mark McCarthy, Julie Hunkapiller, Meg Ehm, Dawn Waterworth, Caroline Fox, Anders Malarstig, Kathy Klinger, Kathy Call, Tomi Mkel, Jaakko Kaprio, Petri Virolainen, Kari Pulkki, Terhi Kilpi, Markus Perola, Jukka Partanen, Anne Pitkranta, Riitta Kaarteenaho, Seppo Vainio, Kimmo Savinainen, Veli-Matti Kosma, Urho Kujala, Outi Tuovila, Minna Hendolin, Raimo Pakkanen, Jeff Waring, Bridget Riley-Gillis, Jimmy Liu, Shameek Biswas, Dorothee Diogo, Catherine Marshall, Xinli Hu, Matthias Gossel, Samuli Ripatti, Johanna Schleutker, Mikko Arvas, Reetta Hinttala, Johannes Kettunen, Reijo Laaksonen, Arto Mannermaa, Juha Paloneva, Hilkka Soininen, Valtteri Julkunen, Anne Remes, Reetta Klviinen, Mikko Hiltunen, Jukka Peltola, Pentti Tienari, Juha Rinne, Adam Ziemann, Jeffrey Waring, Sahar Esmaeeli, Nizar Smaoui, Anne Lehtonen, Susan Eaton, Sanni Landenper, John Michon, Geoff Kerchner, Natalie Bowers, Edmond Teng, John Eicher, Vinay Mehta, Padhraig Y. Gormle, Kari Linden, Christopher Whelan, Fanli Xu, David Pulford, Martti Frkkil, Sampsa Pikkarainen, Airi Jussila, Timo Blomster, Mikko Kiviniemi, Markku Voutilainen, Bob Georgantas, Graham Heap, Fedik Rahimov, Keith Usiskin, Joseph Maranville, Tim Lu, Danny Oh, Kirsi Kalpala, Melissa Miller, Linda McCarthy, Kari Eklund, Antti Palomki, Pia Isomki, Laura Piri, Oili Kaipiainen-Seppnen, Apinya Lertratanaku, David Close Marla Hochfeld Nan Bing, Jorge Esparza Gordillo, Nina Mars, Tarja Laitinen, Margit Pelkonen, Paula Kauppi, Hannu Kankaanranta, Terttu Harju, Steven Greenberg, Hubert Chen, Jo Betts, Soumitra Ghosh, Veikko Salomaa, Teemu Niiranen, Markus Juonala, Kaj Metsrinne, Mika Khnen, Juhani Junttila, Markku Laakso, Jussi Pihlajamki, Juha Sinisalo, Marja-Riitta Taskinen, Tiinamaija Tuomi, Jari Laukkanen, Ben Challis, Andrew Peterson, Audrey Chu, Jaakko Parkkinen, Anthony Muslin, Heikki Joensuu, Tuomo Meretoja, Lauri Aaltonen, Annika Auranen, Peeter Karihtala, Saila Kauppila, Pivi Auvinen, Klaus Elenius, Relja Popovic, Jennifer Schutzman, Andrey Loboda, Aparna Chhibber, Heli Lehtonen, Stefan McDonough, Marika Crohns, Diptee Kulkarni, Kai Kaarniranta, Joni Turunen, Terhi Ollila, Sanna Seitsonen, Hannu Uusitalo, Vesa Aaltonen, Hannele Uusitalo-Jrvinen, Marja Luodonp, Nina Hautala, Erich Strauss, Hao Chen, Anna Podgornaia, Joshua Hoffman, Kaisa Tasanen, Laura Huilaja, Katariina Hannula-Jouppi, Teea Salmi, Sirkku Peltonen, Leena Koulu, Ilkka Harvima, Ying Wu, David Choy, Anu Jalanko, Risto Kajanne, Ulrike Lyhs, Mari Kaunisto, Justin Wade Davis, Danjuma Quarless, Slav Petrovski, Chia-Yen Chen, Paola Bronson, Robert Yang, Diana Chang, Tushar Bhangale, Emily Holzinger, Xulong Wang, Xing Chen, Kirsi Auro, Clarence Wang, Ethan Xu, Franck Auge, Clement Chatelain, Mitja Kurki, Juha Karjalainen, Aki Havulinna, Kimmo Palin, Priit Palta, Pietro Della Briotta Parolo, Wei Zhou, Susanna Lemmel, Manuel Rivas, Jarmo Harju, Arto Lehisto, Andrea Ganna, Vincent Llorens, Antti Karlsson, Kati Kristiansson, Kati Hyvrinen, Jarmo Ritari, Tiina Wahlfors, Miika Koskinen, Katri Pylks, Marita Kalaoja, Minna Karjalainen, Tuomo Mantere, Eeva Kangasniemi, Sami Heikkinen, Eija Laakkonen, Juha Kononen, Anu Loukola, Pivi Laiho, Tuuli Sistonen, Essi Kaiharju, Markku Laukkanen, Elina Jrvensivu, Sini Lhteenmki, Lotta Mnnikk, Regis Wong, Hannele Mattsson, Tero Hiekkalinna, Manuel Gonzlez Jimnez, Kati Donner, KaIle Prn, Javier Nunez-Fontarnau, Elina Kilpelinen, Timo P. Sipi, Georg Brein, Alexander Dada, Ghazal Awaisa, Anastasia Shcherban, Tuomas Sipil, Hannele Laivuori, Tuomo Kiiskinen, Harri Siirtola, Javier Gracia Tabuenca, Lila Kallio, Sirpa Soini, Kimmo Pitknen, Teijo Kuopio

Summary: The study analyzed X chromosome sequencing data in over 65,000 multi-ancestry individuals, identifying associations of the Xq23 locus with lipid changes and reduced risk of CHD and type 2 diabetes.

NATURE COMMUNICATIONS (2021)

Article Genetics & Heredity

Rapid detection of inter-clade recombination in SARS-CoV-2 with Bolotie

Ales Varabyou, Christopher Pockrandt, Steven L. Salzberg, Mihaela Pertea

Summary: A novel method identified potential recombinant SARS-CoV-2 genomes, aiding in the rapid analysis of novel isolates.

GENETICS (2021)

Article Biochemical Research Methods

PhyloCSF plus plus : a fast and user-friendly implementation of PhyloCSF with annotation tools

Christopher Pockrandt, Martin Steinegger, Steven L. Salzberg

Summary: PhyloCSF++ is an efficient and parallelized C++ implementation of the PhyloCSF method, which uses multiple sequence alignments to distinguish protein-coding and non-coding regions in a genome. It can score alignments, generate browser tracks, and annotate coding sequences in various file formats.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

The SAMBA tool uses long reads to improve the contiguity of genome assemblies

Aleksey V. Zimin, Steven L. Salzberg

Summary: Third-generation sequencing technologies generate long reads, which are valuable for resolving complex repeats. An upgrade strategy for existing assemblies is to use long-read data to fill gaps and improve contiguity.

PLOS COMPUTATIONAL BIOLOGY (2022)

Article Genetics & Heredity

A reference-quality, fully annotated genome from a Puerto Rican individual

Aleksey Zimin, Alaina Shumate, Ida Shinder, Jakob Heinz, Daniela Puiu, Mihaela Pertea, Steven L. Salzberg

Summary: Until 2019, there was only one fully annotated version of the human genome. In 2019, a second individual genome was successfully assembled and annotated, which was from an individual of African descent. The new genome is more complete and contiguous than previous genomes.

GENETICS (2022)

Article Biochemical Research Methods

Metagenome analysis using the Kraken software suite

Jennifer Lu, Natalia Rincon, Derrick E. Wood, Florian P. Breitwieser, Christopher Pockrandt, Ben Langmead, Steven L. Salzberg, Martin Steinegger

Summary: This article introduces a step-by-step protocol for the computational analysis of high-throughput DNA sequencing data using the Kraken suite. The protocol includes classification, quantification, and visualization, and can be used for quantifying species in a microbial community and identifying pathogens in clinical samples.

NATURE PROTOCOLS (2022)

Editorial Material Medicine, General & Internal

The Human Contaminome and Understanding Infectious Disease

Patricia J. Simner, Steven L. Salzberg

NEW ENGLAND JOURNAL OF MEDICINE (2022)

Article Biology

Structure-guided isoform identification for the human transcriptome

Markus J. Sommer, Sooyoung Cha, Ales Varabyou, Natalia Rincon, Sukhwan Park, Ilia Minkin, Mihaela Pertea, Martin Steinegger, Steven L. Salzberg

Summary: The development of three-dimensional protein structure prediction methods has provided new opportunities for research on genomes and proteomes. By utilizing computational predictions of protein structures, it is possible to identify the functional protein product among multiple gene isoforms. In this study, we evaluated over 230,000 isoforms of human protein-coding genes using protein structure predictions, and identified several isoforms with more confidently predicted structures and potentially superior function compared to the canonical isoforms in the latest human gene database. We demonstrated the potential of protein structure prediction as a genome annotation tool and provided a resource of protein structures for better understanding the function of human genes and their isoforms.
Article Computer Science, Interdisciplinary Applications

Investigating open reading frames in known and novel transcripts using ORFanage

Ales Varabyou, Beril Erdogdu, Steven L. Salzberg, Mihaela Pertea

Summary: ORFanage is a system that analyzes RNA-seq data to discover novel protein variants and enhance gene annotations. It is fast, scalable, and effectively filters out noise to improve the quality of transcriptome assemblies.

NATURE COMPUTATIONAL SCIENCE (2023)

Article Medicine, Research & Experimental

Next-generation sequencing: insights to advance clinical investigations of the microbiome

Caroline R. Wensel, Jennifer L. Pluznick, Steven L. Salzberg, Cynthia L. Sears

Summary: This Review discusses the advancements in NGS technology for studying the human microbiome, including the pros and cons of different NGS methodologies and important concepts in data variability and study design. Examples of NGS studies on the human microbiome in diverse clinical contexts are provided, as well as insights into the future integration and advancement of NGS in microbiome research and clinical care.

JOURNAL OF CLINICAL INVESTIGATION (2022)

Article Clinical Neurology

Guillain-Barre Syndrome Outbreak in Peru 2019 Associated With Campylobacter jejuni Infection

Ana P. Ramos, Sonja E. Leonhard, Susan K. Halstead, Mireya A. Cuba, Carlos C. Castaneda, Jose A. Dioses, Martin A. Tipismana, Jesus T. Abanto, Alejandro Llanos, Dawn Gourlay, Max Grogl, Mariana Ramos, Jesus D. Rojas, Rina Meza, Daniela Puiu, Rachel M. Sherman, Steven L. Salzberg, Patricia J. Simner, Hugh J. Willison, Bart C. Jacobs, David R. Cornblath, Hugo F. Umeres, Carlos A. Pardo

Summary: The study found that the 2019 Peruvian GBS outbreak was associated with Campylobacter jejuni infection, with the related strains circulating widely worldwide.

NEUROLOGY-NEUROIMMUNOLOGY & NEUROINFLAMMATION (2021)

暂无数据