4.7 Article Proceedings Paper

Inference of species phylogenies from bi-allelic markers using pseudo-likelihood

Journal

BIOINFORMATICS
Volume 34, Issue 13, Pages 376-385

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bty295

Keywords

-

Funding

  1. National Science Foundation [DBI-1355998, CCF-1302179, CCF-1514177, DMS-1547433]
  2. Big-Data Private-Cloud Research Cyberinfrastructure MRI-award - NSF [CNS-1338099]
  3. Rice University
  4. Division of Computing and Communication Foundations
  5. Direct For Computer & Info Scie & Enginr [1514177] Funding Source: National Science Foundation
  6. Div Of Biological Infrastructure
  7. Direct For Biological Sciences [1355998] Funding Source: National Science Foundation

Ask authors/readers for more resources

Motivation: Phylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g. single nucleotide polymorphism data) and allows for exact likelihood computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method's applicability. Results: In this article, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data. The proposed method allows for analyzing larger datasets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available