4.7 Article

LCA*: an entropy-based measure for taxonomic assignment within assembled metagenomes

Journal

BIOINFORMATICS
Volume 32, Issue 23, Pages 3535-3542

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btw400

Keywords

-

Funding

  1. Genome Canada
  2. Genome British Columbia
  3. Genome Alberta
  4. Natural Science and Engineering Research Council (NSERC) of Canada
  5. Canadian Foundation for Innovation (CFI)
  6. Canadian Institute for Advanced Research (CIFAR)
  7. UBC four-year doctoral fellowship (4YF)
  8. Tula Foundation

Ask authors/readers for more resources

Motivation: A perennial problem in the analysis of environmental sequence information is the assignment of reads or assembled sequences, e.g. contigs or scaffolds, to discrete taxonomic bins. In the absence of reference genomes for most environmental microorganisms, the use of intrinsic nucleotide patterns and phylogenetic anchors can improve assembly-dependent binning needed for more accurate taxonomic and functional annotation in communities of microorganisms, and assist in identifying mobile genetic elements or lateral gene transfer events. Results: Here, we present a statistic called LCA* inspired by Information and Voting theories that uses the NCBI Taxonomic Database hierarchy to assign taxonomy to contigs assembled from environmental sequence information. The LCA* algorithm identifies a sufficiently strong majority on the hierarchy while minimizing entropy changes to the observed taxonomic distribution resulting in improved statistical properties. Moreover, we apply results from the order-statistic literature to formulate a likelihood-ratio hypothesis test and P-value for testing the supremacy of the assigned LCA* taxonomy. Using simulated and real-world datasets, we empirically demonstrate that voting-based methods, majority vote and LCA*, in the presence of known reference annotations, are consistently more accurate in identifying contig taxonomy than the lowest common ancestor algorithm popularized by MEGAN, and that LCA* taxonomy strikes a balance between specificity and confidence to provide an estimate appropriate to the available information in the data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available