4.8 Article

Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient for Statistical Phylogenetics

Journal

MOLECULAR BIOLOGY AND EVOLUTION
Volume 37, Issue 10, Pages 3047-3060

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/molbev/msaa130

Keywords

linear-time gradient algorithm; random-effects molecular clock model; Bayesian inference; maximum likelihood

Funding

  1. European Research Council under the European Union's Horizon 2020 research and innovation program [725422]
  2. Wellcome Trust [206298/Z/17/Z]
  3. NSF [DMS 1264153]
  4. NIH [R01 AI107034, U19 AI135995]
  5. NIH-NIAID [K25AI153816]
  6. Interne Fondsen KU Leuven/Internal Funds KU Leuven [C14/18/094]
  7. Research Foundation-Flanders (Fonds voor Wetenschappelijk Onderzoek-Vlaanderen) [G066215N, G0D5117N, G0B9317N]

Ask authors/readers for more resources

Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N-2) operations, where N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithm significantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available