4.6 Article

MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments

Journal

PLOS ONE
Volume 17, Issue 9, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0274260

Keywords

-

Funding

  1. Genome Canada
  2. Ontario Genomics for the Sequencing the Rivers for Environmental Assessment and Monitoring (STREAM) project
  3. Government of Canada through the Genomics Research and Development Initiative (GRDI), Metagenomics-based ecosystem biomonitoring (Ecobiomics) project

Ask authors/readers for more resources

Multi-marker metabarcoding is used for generating biodiversity information. MetaWorks provides a harmonized processing environment and pipeline for handling Illumina reads of all biota, along with various workflows and taxonomic assignment approaches.
Multi-marker metabarcoding is increasingly being used to generate biodiversity information across different domains of life from microbes to fungi to animals such as for molecular ecology and biomonitoring applications in different sectors from academic research to regulatory agencies and industry. Current popular bioinformatic pipelines support microbial and fungal marker analysis, while ad hoc methods are often used to process animal metabarcode markers from the same study. MetaWorks provides a harmonized processing environment, pipeline, and taxonomic assignment approach for demultiplexed Illumina reads for all biota using a wide range of metabarcoding markers such as 16S, ITS, and COI. A Conda environment is provided to quickly gather most of the programs and dependencies for the pipeline. Several workflows are provided such as: taxonomically assigning exact sequence variants, provides an option to generate operational taxonomic units, and facilitates single-read processing. Pipelines are automated using Snakemake to minimize user intervention and facilitate scalability. All pipelines use the RDP classifier to provide taxonomic assignments with confidence measures. We extend the functionality of the RDP classifier for taxonomically assigning 16S (bacteria), ITS (fungi), and 28S (fungi), to also support COI (eukaryotes), rbcL (eukaryotes, land plants, diatoms), 12S (fish, vertebrates), 18S (eukaryotes, diatoms) and ITS (fungi, plants). MetaWorks properly handles ITS by trimming flanking conserved rRNA gene regions as well as protein coding genes by providing two options for removing obvious pseudogenes. MetaWorks can be downloaded from https://github.com/terrimporter/MetaWorks and quickstart instructions, pipeline details, and a tutorial for new users can be found at https://terrimporter.github.io/MetaWorksSite.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Review Biochemistry & Molecular Biology

Scaling up: A guide to high-throughput genomic approaches for biodiversity analysis

Teresita M. Porter, Mehrdad Hajibabaei

MOLECULAR ECOLOGY (2018)

Article Multidisciplinary Sciences

Automated high throughput animal CO1 metabarcode classification

Teresita M. Porter, Mehrdad Hajibabaei

SCIENTIFIC REPORTS (2018)

Article Multidisciplinary Sciences

Over 2.5 million COI sequences in GenBank and growing

Teresita M. Porter, Mehrdad Hajibabaei

PLOS ONE (2018)

Article Multidisciplinary Sciences

Freshwater diatom biomonitoring through benthic kick-net metabarcoding

Victoria Carley Maitland, Chloe Victoria Robinson, Teresita M. Porter, Mehrdad Hajibabaei

PLOS ONE (2020)

Article Biochemical Research Methods

Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets

T. M. Porter, M. Hajibabaei

Summary: This study aims to develop a method to screen for nuMTs in large COI datasets. Results show that introducing a pseudogene filtering step in the processing of marker gene sequences can remove up to 5% of sequences, and using open reading frame length filtering combined with hidden Markov model analysis can effectively screen out apparent pseudogenes from large datasets.

BMC BIOINFORMATICS (2021)

Article Ecology

Environmental filtering of macroinvertebrate traits influences ecosystem functioning in a large river floodplain

Natalie K. Rideout, Zacchaeus G. Compson, Wendy A. Monk, Meghann R. Bruce, Mehrdad Hajibabaei, Teresita M. Porter, Michael T. G. Wright, Donald J. Baird

Summary: The study examines the relationships among floodplain wetland habitats, invertebrate communities, and ecosystem function. It reveals the importance of environmental filters and traits linked to ecosystem functions in shaping the diversity and stability of floodplain ecosystems.

FUNCTIONAL ECOLOGY (2022)

No Data Available