4.6 Article

Benchmark datasets for phylogenomic pipeline validation, applications for foodborne pathogen surveillance

期刊

PEERJ
卷 5, 期 -, 页码 -

出版社

PEERJ INC
DOI: 10.7717/peerj.3893

关键词

Benchmark datasets; Phylogenomics; Food safety; Foodborne outbreak; Salmonella; Listeria, E. coli; Validation; WGS

资金

  1. Center for Food Safety and Applied Nutrition at the Food and Drug Administration
  2. Advanced Molecular Detection (AMD) Initiative at Centers for Disease Control and Prevention
  3. National Institutes of Health, National Library of Medicine
  4. USDA-FSIS

向作者/读者索取更多资源

Background. As next generation sequence technology has advanced, there have been parallel advances in genome-scale analysis programs for determining evolutionary relationships as proxies for epidemiological relationship in public health. Most new programs skip traditional steps of ortholog determination and multi-gene alignment, instead identifying variants across a set of genomes, then summarizing results in a matrix of single-nucleotide polymorphisms or alleles for standard phylogenetic analysis. However, public health authorities need to document the performance of these methods with appropriate and comprehensive datasets so they can be validated for specific purposes, e.g., outbreak surveillance. Here we propose a set of benchmark datasets to be used for comparison and validation of phylogenomic pipelines. Methods. We identified four well-documented foodborne pathogen events in which the epidemiology was concordant with routine phylogenomic analyses (reference-based SNP and wgMLST approaches). These are ideal benchmark datasets, as the trees, WGS data, and epidemiological data for each are all in agreement. We have placed these sequence data, sample metadata, and known'' phylogenetic trees in publiclyaccessible databases and developed a standard descriptive spreadsheet format describing each dataset. To facilitate easy downloading of these benchmarks, we developed an automated script that uses the standard descriptive spreadsheet format. Results. Our outbreak'' benchmark datasets represent the four major foodborne bacterial pathogens (Listeria monocytogenes, Salmonella enterica, Escherichia coli, and Campylobacterjejuni) and one simulated dataset where the known tree'' can be accurately called the true tree''. The downloading script and associated table files are available on GitHub: https://github.com/WGS-standards-and-analysis/datasets. Discussion. These five benchmark datasets will help standardize comparison of current and future phylogenomic pipelines, and facilitate important cross-institutional collaborations. Our work is part of a global effort to provide collaborative infrastructure for sequence data and analytic tools-we welcome additional benchmark datasets in our recommended format, and, if relevant, we will add these on our GitHub site. Together, these datasets, dataset format, and the underlying GitHub infrastructure present a recommended path for worldwide standardization of phylogenomic pipelines.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Public, Environmental & Occupational Health

Shigella sonnei Outbreak Investigation During a Municipal Water Crisis-Genesee and Saginaw Counties, Michigan, 2016

R. Paul McClung, Mateusz Karwowski, Caroline Castillo, Jevon McFadden, Sarah Collier, Jim Collins, Marty Soehnlen, Stephen Dietrich, Eija Trees, Grete Wilt, Christina Harrington, Ashley Miller, Elizabeth Adam, Hannah Reses, Jennifer Cope, Katie Fullerton, Vincent Hill, Jonathan Yoder

AMERICAN JOURNAL OF PUBLIC HEALTH (2020)

Article Multidisciplinary Sciences

Gen-FS coordinated proficiency test data for genomic foodborne pathogen surveillance, 2017 and 2018 exercises

Ruth E. Timme, Patricia C. Lafon, Maria Balkey, Jennifer K. Adams, Darlene Wagner, Heather Carleton, Errol Strain, Maria Hoffmann, Ashley Sabol, Hugh Rand, Rebecca Lindsey, Deborah Sheehan, Joseph D. Baugher, Eija Trees

SCIENTIFIC DATA (2020)

Article Genetics & Heredity

Phylogeny of Salmonella enterica subspecies arizonae by whole-genome sequencing reveals high incidence of polyphyly and low phase 1 H antigen variability

Nikki W. Shariat, Ruth E. Timme, Abigail T. Walters

Summary: Salmonella enterica subspecies arizonae displays unique evolutionary patterns revealed through whole-genome sequencing data and core genome phylogenetic analysis, including polyphyly, high conservation of antigens, presence of prophages and plasmids, and specific pathogenic islands. These characteristics make subspecies arizonae a distinct lineage within the highly diverse species of Salmonella.

MICROBIAL GENOMICS (2021)

Correction Biotechnology & Applied Microbiology

Phylogenetic and Biogeographic Patterns of Vibrio parahaemolyticus Strains from North America Inferred from Whole-Genome Sequence Data (vol 87, e01403-20, 2021)

John J. Miller, Bart C. Weimer, Ruth Timme, Catharina H. M. Ludeke, James B. Pettengill, D. J. Darwin Bandoy, Allison M. Weis, James Kaufman, B. Carol Huang, Justin Payne, Errol Strain, Jessica L. Jones

APPLIED AND ENVIRONMENTAL MICROBIOLOGY (2021)

Article Biochemical Research Methods

SAUTE: sequence assembly using target enrichment

Alexandre Souvorov, Richa Agarwala

Summary: SAUTE and SAUTE_PROT are assemblers proposed to assist with the assembly of repeat regions and to report multiple well supported variants when target sequences are provided. These assemblers utilize de Bruijn graphs on reads, with targets ranging from RNA-seq to genomic reads.

BMC BIOINFORMATICS (2021)

Article Biology

Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package

Emma J. Griffiths, Ruth E. Timme, Catarina Ines Mendes, Andrew J. Page, Nabil-Fareed Alikhan, Dan Fornika, Finlay Maguire, Josefina Campos, Daniel Park, Idowu B. Olawoye, Paul E. Oluniyi, Dominique Anderson, Alan Christoffels, Anders Goncalves da Silva, Rhiannon Cameron, Damion Dooley, Lee S. Katz, Allison Black, Ilene Karsch-Mizrachi, Tanya Barrett, Anjanette Johnston, Thomas R. Connor, Samuel M. Nicholls, Adam A. Witney, Gregory H. Tyson, Simon H. Tausch, Amogelang R. Raphenya, Brian Alcock, David M. Aanensen, Emma Hodcroft, William W. L. Hsiao, Ana Tereza R. Vasconcelos, Duncan R. MacCannell

Summary: PHA4GE is a global coalition working to improve openness, interoperability, and consistency in public health microbial bioinformatics. They have developed a SARS-CoV-2 contextual data specification package to support data collection and harmonization in public biorepositories.

GIGASCIENCE (2022)

Article Biotechnology & Applied Microbiology

Use of Whole Genome Sequencing by the Federal Interagency Collaboration for Genomics for Food and Feed Safety in the United States

Eric L. Stevens, Heather A. Carleton, Jennifer Beal, Glenn E. Tillman, Rebecca L. Lindsey, A. C. Lauer, Arthur Pightling, Karen G. Jarvis, Andrea Ottesen, Padmini Ramachandran, Leslie Hintz, Lee S. Katz, Jason P. Folster, Jean M. Whichard, Eija Trees, Ruth E. Timme, Patrick McDermott, Beverly Wolpert, Michael Bazaco, Shaohua Zhao, Sabina Lindley, Beau B. Bruce, Patricia M. Griffin, Eric Brown, Marc Allard, Sandra Tallent, Kari Irvin, Maria Hoffmann, Matt Wise, Robert Tauxe, Peter Gerner-Smidt, Mustafa Simmons, Bonnie Kissler, Stephanie Defibaugh-Chavez, William Klimke, Richa Agarwala, James Lindsay, Kimberly Cook, Suelee Robbe Austerman, David Goldman, Sherri McGARRY, Kis Robertson Hale, Uday Dessai, Steven M. Musser, Chris Braden

Summary: This report provides an overview of the use of whole genome sequencing (WGS) technology for detecting and characterizing foodborne pathogens and identifying their sources. It highlights the collaborative efforts among federal agencies in food safety and describes the methods used in genetic analysis networks. The report emphasizes the application of WGS in pathogen characterization and source attribution, as well as the impact of culture-independent diagnostic tests on food safety analysis.

JOURNAL OF FOOD PROTECTION (2022)

Article Food Science & Technology

Sequencing of Enteric Bacteria: Library Preparation Procedure Matters for Accurate Identification and Characterization

Angela Poates, Jenny Truong, Rebecca Lindsey, Taylor Griswold, Amanda J. Williams-Newkirk, Heather Carleton, Eija Trees

Summary: This study aims to optimize and validate the Illumina DNA Prep kit for sequencing enteric pathogens and compare its performance against the Nextera XT kit. The Prep libraries outperformed the XT libraries, especially in Escherichia sequences, and showed better accuracy in predicting O group and detecting related genes.

FOODBORNE PATHOGENS AND DISEASE (2022)

Article Microbiology

A Schema for Digitized Surface Swab Site Metadata in Open-Source DNA Sequence Databases

Jingzhang Feng, Devin Daeschel, Damion Dooley, Emma Griffiths, Marc Allard, Ruth Timme, Yi Chen, Abigail B. B. Snyder

Summary: The regular analysis of whole-genome sequence data is crucial for detecting outbreaks of infectious diseases, but the metadata in existing databases are often incomplete and of poor quality. We generated large-scale, open-source DNA sequence databases by collecting microbial pathogens from built environments. To analyze these data for public health surveillance, the complex metadata associated with the swab site locations need to be digitized. Through content analysis, we identified 5 informational facets described by 338 unique terms and developed a schema that has been integrated into a publicly available pathogen metadata standard.

MSYSTEMS (2023)

Article Microbiology

Application of quasimetagenomics methods to define microbial diversity and subtype Listeria monocytogenes in dairy and seafood production facilities

Brandon Kocurek, Padmini Ramachandran, Christopher J. Grim, Paul Morin, Laura Howard, Andrea Ottesen, Ruth Timme, Susan R. Leonard, Hugh Rand, Errol Strain, Daniel Tadesse, James B. Pettengill, David W. Lacher, Mark Mammel, Karen G. Jarvis, Luxin Wang

Summary: This study investigated the microbial diversity in food production environments and demonstrated the successful assembly of L. monocytogenes genomes using shotgun quasimetagenomic sequencing. Additionally, the study showed that pathogen detection can still be achieved with low genome coverage in a metagenome sequencing data set.

MICROBIOLOGY SPECTRUM (2023)

Article Microbiology

Enterobacterales draft genome sequences: 15 historical (1998-2004) and 30 contemporary (2015-2016) clinical isolates from Pakistan

Matthew A. Crawford, Christine Lascols, Sara Lomonaco, Ruth E. Timme, Debra J. Fisher, Kevin Anderson, David R. Hodge, Stephen A. Morse, Segaran P. Pillai, Shashi K. Sharma, Erum Khan, Marc W. Allard, Molly A. Hughes

Summary: This study presents draft genomes of 45 Enterobacterales clinical isolates collected in Pakistan between 1998 and 2016, including drug-resistant strains. The emergence and spread of antimicrobial resistance among pathogenic bacteria pose continuous threats to health and economy.

MICROBIOLOGY RESOURCE ANNOUNCEMENTS (2023)

Article Biotechnology & Applied Microbiology

Phylogenetic and Biogeographic Patterns of Vibrio parahaemolyticus Strains from North America Inferred from Whole-Genome Sequence Data

John J. Miller, Bart C. Weimer, Ruth Timme, Catharina H. M. Luedeke, James B. Pettengill, D. J. Darwin Bandoy, Allison M. Weis, James Kaufman, B. Carol Huang, Justin Payne, Errol Strain, Jessica L. Jones

Summary: Through genome sequencing of 132 North American Vibrio parahaemolyticus isolates, this study revealed the population structure of this species along the Gulf and Atlantic Coasts, with some sequence types shared between the Gulf Coast and the coastal waters of Washington State. The identification of functional gene categories enriched in isolates from clinical sources provides insight into the potential pathogenicity of Vibrio parahaemolyticus and its adaptability in different environments.

APPLIED AND ENVIRONMENTAL MICROBIOLOGY (2021)

Review Public, Environmental & Occupational Health

Optimizing open data to support one health: best practices to ensure interoperability of genomic data from bacterial pathogens

Ruth E. Timme, William J. Wolfgang, Maria Balkey, Sai Laxmi Gubbala Venkata, Robyn Randolph, Marc Allard, Errol Strain

ONE HEALTH OUTLOOK (2020)

Article Microbiology

Closed Genome Sequences of 28 Foodborne Pathogens from the CFSAN Verification Set, Determined by a Combination of Long and Short Reads

Narjol Gonzalez-Escalona, George John Kastanis, Ruth Timme, Dwayne Roberson, Maria Balkey, Sandra M. Tallent

MICROBIOLOGY RESOURCE ANNOUNCEMENTS (2020)

Article Public, Environmental & Occupational Health

Multistate outbreak of Salmonella Poona infections associated with imported cucumbers, 2015-2016

M. Laughlin, L. Bottichio, J. Weiss, J. Higa, E. McDonald, R. Sowadsky, D. Fejes, A. Saupe, G. Provo, S. Seelman, J. Concepcion-Acevedo, L. Gieraltowski, J. Narang, M. Needham, A. Barnes, A. Maroufi, H. Buonomo, C. Neiss, L. Negado, J. Healy, F. Ni, K. Trinh, L. McCullough, C. Rigdon, J. Ayers, E. Reed, S. Viazis, A. Crosby, A. Tesfai, S. Lance, L. Whitlock, E. Trees, D. Wagner, A. Sabot, I Williams

EPIDEMIOLOGY AND INFECTION (2019)

暂无数据