4.6 Article

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries

Journal

BMC BIOINFORMATICS
Volume 19, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s12859-018-2293-0

Keywords

Resequencing; Target enrichment; Long-range PCR; Sequence error; Divide and conquer; PacBio amplicon analysis

Funding

  1. U.S. NSF Plant Genome Research Program [IOS-1127076]
  2. Delaware INBRE grant NIH/NIGMS [GM103446]

Ask authors/readers for more resources

Background: Targeted resequencing with high-throughput sequencing (HIS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HIS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method. Results: A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing. Conclusions: C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Plant Sciences

The Genetics of Leaf Flecking in Maize and Its Relationship to Plant Defense and Disease Resistance

Bode A. Olukolu, Yang Bian, Brian De Vries, William F. Tracy, Randall J. Wisser, James B. Holland, Peter J. Balint-Kurti

PLANT PHYSIOLOGY (2016)

Article Genetics & Heredity

Using Maize Chromosome Segment Substitution Line Populations for the Identification of Loci Associated with Multiple Disease Resistance

Luis O. Lopez-Zuniga, Petra Wolters, Scott Davis, Teclemariam Weldekidan, Judith M. Kolkman, Rebecca Nelson, K. S. Hooda, Elizabeth Rucker, Wade Thomason, Randall Wisser, Peter Balint-Kurti

G3-GENES GENOMES GENETICS (2019)

Article Genetics & Heredity

Validation and Characterization of Maize Multiple Disease Resistance QTL

Lais B. Martins, Elizabeth Rucker, Wade Thomason, Randall J. Wisser, James B. Holland, Peter Balint-Kurti

G3-GENES GENOMES GENETICS (2019)

Article Genetics & Heredity

The Genomic Basis for Short-Term Evolution of Environmental Adaptation in Maize

Randall J. Wisser, Zhou Fang, James B. Holland, Juliana E. C. Teixeira, John Dougherty, Teclemariam Weldekidan, Natalia de Leon, Sherry Flint-Garcia, Nick Lauter, Seth C. Murray, Wenwei Xu, Arnel Hallauer

GENETICS (2019)

Article Agronomy

Genomic prediction for resistance to Fusarium ear rot and fumonisin contamination in maize

James B. Holland, Thiago P. Marino, Heather C. Manching, Randall J. Wisser

CROP SCIENCE (2020)

Article Genetics & Heredity

Identification of Loci That Confer Resistance to Bacterial and Fungal Diseases of Maize

Yuting Qiu, Julian Cooper, Christopher Kaiser, Randall Wisser, Santiago X. Mideros, Tiffany M. Jamann

G3-GENES GENOMES GENETICS (2020)

Article Biochemical Research Methods

SPEARS: Standard Performance Evaluation of Ancestral haplotype Reconstruction through Simulation

Heather Manching, Randall J. Wisser

Summary: The study introduces SPEARS, a pipeline for simulating genome-wide haplotype maps from sparse genotype data. Using a specified pedigree, the tool generates virtual genotypes with genotyping errors and missing data, simulating practical analysis and capturing sources of error. Standard metrics allow researchers to assess different population designs and the accuracy of haplotype structure for analysis.

BIOINFORMATICS (2021)

Article Genetics & Heredity

Genome assembly of a Mesoamerican derived variety of lima bean: a foundational cultivar in the Mid-Atlantic USA

Randall J. Wisser, Sara J. Oppenheim, Emmalea G. Ernest, Terence T. Mhora, Michael D. Dumas, Nancy F. Gregory, Thomas A. Evans, Nicole M. Donofrio

Summary: Lima beans, high in fiber and protein, are widely grown in Delaware but are susceptible to diseases like pod rot and downy mildew. Understanding resistance genes is crucial for the thriving of this industry. Studying the Bridgeton cultivar has provided valuable insights into potential resistance genes and evolutionary dynamics in legumes.

G3-GENES GENOMES GENETICS (2021)

Article Plant Sciences

Environment-specific selection alters flowering-time plasticity and results in pervasive pleiotropic responses in maize

Nicole E. Choquette, James B. Holland, Teclemariam Weldekidan, Justine Drouault, Natalia de Leon, Sherry Flint-Garcia, Nick Lauter, Seth C. Murray, Wenwei Xu, Randall J. Wisser

Summary: Through experimental evolution in maize, this study investigated the response to selection and the possibility of moving plant germplasm across different geographical zones. The results showed that the flowering time of maize has plasticity, and different selection methods and photoperiods had significant effects on the selection outcomes. This study demonstrated the potential of phenotypic selection in rapidly shifting the phenology and plasticity of crops, and highlighted the importance of selecting crops to local conditions for climate change adaptation.

NEW PHYTOLOGIST (2023)

No Data Available