4.6 Article

Longer is Not Always Better: Optimizing Barcode Length for Large-Scale Species Discovery and Identification

Journal

SYSTEMATIC BIOLOGY
Volume 69, Issue 5, Pages 999-1015

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/sysbio/syaa014

Keywords

DNA barcoding; metabarcoding; mini-barcodes; species discovery

Funding

  1. Ministry of Education grant on biodiversity discovery [R-154-000-A22-112]

Ask authors/readers for more resources

New techniques for the species-level sorting of millions of specimens are needed in order to accelerate species discovery, determine how many species live on earth, and develop efficient biomonitoring techniques. These sorting methods should be reliable, scalable, and cost-effective, as well as being largely insensitive to low-quality genomic DNA, given that this is usually all that can be obtained from museum specimens. Mini-barcodes seem to satisfy these criteria, but it is unclear how well they perform for species-level sorting when compared with full-length barcodes. This is here tested based on 20 empirical data sets covering ca. 30,000 specimens (5500 species) and six Glade-specific data sets from GenBank covering ca. 98,000 specimens (>20,000 species). All specimens in these data sets had full-length barcodes and had been sorted to species-level based on morphology. Mini-barcodes of different lengths and positions were obtained in silico from full-length barcodes using a sliding window approach (three windows: 100 bp, 200 bp, and 300 bp) and by excising nine mini-barcodes with established primers (length: 94-407 bp). We then tested whether barcode length and/or position reduces species-level congruence between morphospecies and molecular operational taxonomic units (mOTUs) that were obtained using three different species delimitation techniques (Poisson Tree Process, Automatic Barcode Gap Discovery, and Objective Clustering). Surprisingly, we find no significant differences in performance for both species- or specimen-level identification between full-length and mini-barcodes as long as they are of moderate length (>200 bp). Only very short mini-barcodes (<200 bp) perform poorly, especially when they are located near the 5' end of the Folmer region. The mean congruence between morphospecies and mOTUs was ca. 75% for barcodes >200 bp and the congruent mOTUs contain ca. 75% of all specimens. Most conflict is caused by ca. 10% of the specimens that can be identified and should be targeted for reexamination in order to efficiently resolve conflict. Our study suggests that large-scale species discovery, identification, and metabarcoding can utilize mini-barcodes without any demonstrable loss of information compared to full-length barcodes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available