Article
Biochemical Research Methods
Rohan S. Mehta, Mike Steel, Noah A. Rosenberg
Summary: Monophyly is a feature used to evaluate phylogenetic and species range, and the probability of joint monophyly (JM) has been widely used in these evaluations. This study derives the probability of JM for arbitrary numbers of separate groups in arbitrary species trees based on coalescent theory, and investigates the impact of tree height, sample size, and number of species on the JM probability.
JOURNAL OF COMPUTATIONAL BIOLOGY
(2022)
Article
Biochemistry & Molecular Biology
Tianqi Zhu, Ziheng Yang
Summary: This study investigated the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock using mathematical analysis and computer simulation. The results suggest that full-likelihood methods are considerably more efficient than summary methods and can provide estimates of important parameters.
MOLECULAR BIOLOGY AND EVOLUTION
(2021)
Article
Biochemistry & Molecular Biology
Jun Huang, Yuttapong Thawornwattana, Tomas Flouri, James Mallet, Ziheng Yang
Summary: Genomic sequence data are valuable for studying species divergence and gene flow. However, when the model of gene flow is misspecified, estimation bias and interpretation issues may arise. Despite this, the simple introgression model can still be useful for extracting information about between-species gene flow and divergence.
MOLECULAR BIOLOGY AND EVOLUTION
(2022)
Article
Evolutionary Biology
Zhi Yan, Megan L. Smith, Peng Du, Matthew W. Hahn, Luay Nakhleh
Summary: Recent phylogenetic methods have focused on accurately inferring species trees in the presence of gene tree discordance due to incomplete lineage sorting (ILS). Most of these methods assume that data for each locus consist of orthologous, single-copy sequences, excluding loci with more than one copy in any of the studied genomes. This study examines the consequences of running such methods on data with paralogs, with or without ILS. Through simulations and analysis of biological data sets, the researchers demonstrate that these methods can still provide accurate results when paralogs are present. The findings have significant implications for expanding the data available for phylogenetic inference.
SYSTEMATIC BIOLOGY
(2022)
Review
Multidisciplinary Sciences
Xiyun Jiao, Tomas Flouri, Ziheng Yang
Summary: The Multispecies Coalescent (MSC) model extends the single-population coalescent model to multiple species, allowing for estimation of species divergence times, population sizes, species trees, and inference of cross-species gene flow and species delimitation. This framework faces statistical and computational challenges, but research advancements are leading the way towards breakthroughs in the field in the next few years.
NATIONAL SCIENCE REVIEW
(2021)
Article
Biology
Elizabeth S. Allman, Hector Banos, Jonathan D. Mitchell, John A. Rhodes
Summary: Inference of species networks under the Network Multispecies Coalescent Model is limited by computational demands and the complexity of the networks. This study focuses on the tree of blobs, where non-cut edges are contracted to nodes, to infer a general species network. An identifiability theorem is established, stating that most features of the unrooted tree of blobs can be determined from the distribution of gene quartet topologies. This suggests a practical algorithm for tree of blobs inference.
JOURNAL OF MATHEMATICAL BIOLOGY
(2023)
Article
Mycology
Chitrabhanu S. Bhunjun, Chayanard Phukhamsakda, Ruvishika S. Jayawardena, Rajesh Jeewon, Itthayakorn Promputtha, Kevin D. Hyde
Summary: This study utilized multiple molecular approaches to establish species boundaries in Colletotrichum, finding that the ITS region can resolve species complex level issues and that GAPDH and TUB2 markers are the most informative. The introduction of a new species complex was based on congruent results from different molecular approaches. Coalescent methods and multi-locus phylogeny are crucial for establishing species boundaries in Colletotrichum.
Article
Evolutionary Biology
Zhi Yan, Huw A. Ogilvie, Luay Nakhleh
Summary: The evolutionary histories of individual loci in a genome can be estimated independently, but this approach is prone to errors due to limited sequence data for each gene. Various gene tree error correction methods have been developed to reduce the distance between gene trees and species trees. We evaluated the performance of two representative methods: TRACTION and TreeFix. Our findings suggest that gene tree error correction often increases the errors in gene tree topologies, even when the true gene and species trees are discordant. Full Bayesian inference of gene trees under the multispecies coalescent model is more accurate than independent inference. Future gene tree correction methods should incorporate realistic models of evolution instead of relying on oversimplified heuristics.
GENOME BIOLOGY AND EVOLUTION
(2023)
Article
Biology
Yao-ban Chan, Qiuyi Li, Celine Scornavacca
Summary: This paper studies a statistical method to infer a species tree from a set of gene trees. The results show that the error probability decays exponentially as the number of input gene trees increases. A closed form for the error probability is derived for a four-taxon species tree, and improved upper bounds for the sample complexity are obtained.
JOURNAL OF MATHEMATICAL BIOLOGY
(2022)
Review
Biochemistry & Molecular Biology
Zachary B. Hancock, Emma S. Lehmberg, Heath Blackmon
Summary: In this article, we review the evidence for how continuous spatial structure can impact phylogenetic inference. Using complex continuous-space demographic models, we illustrate the impact of spatial structure on gene tree stoichiometry, topological and branch-length variance, network estimation, and species delimitation. We conclude by suggesting how researchers can identify spatial structure in phylogenetic datasets.
MOLECULAR PHYLOGENETICS AND EVOLUTION
(2022)
Article
Plant Sciences
Marinoli Rivas-Chamorro, Richard Cadenillas, Xue-Jun Ge, Lu Jin, Betty Millan, Julissa Roncal
Summary: This study examined the variation within the hyperdominant tree species Astrocaryum murumuru in the Amazon rainforest. The results showed that this species is actually composed of three separate lineages, suggesting that the previously recognized 15 morphology-based species may be an overestimate. The findings highlight the importance of using genomic data for species delimitation analysis to understand the intraspecific variation of hyperdominant species in the Amazon rainforest.
Article
Biology
Kristina Wicke, Mareike Fischer, Laura Kubatko
Summary: Phylogenetic diversity indices such as the FP index are commonly used in biodiversity conservation to prioritize species. However, due to discordance between gene trees and species trees, these indices may result in different rankings.
JOURNAL OF MATHEMATICAL BIOLOGY
(2023)
Article
Biochemical Research Methods
Mahim Mahbub, Zahin Wahab, Rezwana Reaz, M. Saifur Rahman, Md Shamsuzzoha Bayzid
Summary: Estimating species trees from genes sampled from the whole genome is challenging due to gene tree-species tree discordance, with incomplete lineage sorting being a common cause. Quartet-based weighted methods offer a statistically consistent way for accurate species tree estimation in such cases. The proposed wQFM method extends the quartet FM algorithm to a weighted setting, providing highly accurate species tree estimation results on simulated and real biological datasets.
Article
Evolutionary Biology
Elyse Parker, Alex Dornburg, Carl D. Struthers, Christopher D. Jones, Thomas J. Near
Summary: The application of genetic data to species delimitation can enhance confidence in previously hypothesized delimitations and uncover previously undescribed diversity. However, there is a concern that genetic data-based approaches may result in taxonomic oversplitting by confounding population structure with species diversity. An integrative approach, which evaluates molecular, morphological, ecological, and geographic evidence together, is increasingly recognized as necessary. In this study, the authors used phylogenetic, population genetic, and coalescent analyses of genome-wide sequence data, along with investigation of morphological traits, to delimit species within the Antarctic barbeled plunderfishes. Their findings support the recognition of fewer species than currently recognized and propose the synonymization of multiple species. The study highlights the utility of an integrative species delimitation framework and provides evidence of taxonomic oversplitting based on morphology.
SYSTEMATIC BIOLOGY
(2022)
Article
Biochemistry & Molecular Biology
Richard Adams, Michael DeGiorgio
Summary: Likelihood-based tests of phylogenetic trees are crucial in modern systematics. While many such tests exist for gene trees, there is a lack of comparable frameworks for testing species tree hypotheses. To address this, we derive likelihood-based approaches for testing species tree topologies using gene tree topologies as input. These tests leverage the statistical procedures of their gene tree-based counterparts and have been demonstrated with simulated and empirical data sets. We also introduce an open-source R package for conducting formal likelihood-based tests of species topologies.
MOLECULAR BIOLOGY AND EVOLUTION
(2023)
Article
Biochemical Research Methods
Elizabeth S. Allman, Hector Banos, John A. Rhodes
ALGORITHMS FOR MOLECULAR BIOLOGY
(2019)
Article
Biology
Samaneh Yourdkhani, John A. Rhodes
BULLETIN OF MATHEMATICAL BIOLOGY
(2020)
Article
Biochemical Research Methods
John A. Rhodes, Hector Banos, Jonathan D. Mitchell, Elizabeth S. Allman
Summary: MSCquartets is an R package for species tree hypothesis testing, inference of species trees, and inference of species networks. It takes collections of metric or topological locus trees as input, summarizes them using quartets, and displays hypothesis test results in a simplex plot. The package implements algorithms for topological and metric species tree inference, as well as level-1 topological species network inference.
Article
Biology
Elizabeth S. Allman, Hector Banos, John A. Rhodes
Summary: Inference of network-like evolutionary relationships between species from genomic data must consider both gene flow and incomplete lineage sorting. Standard methods have high computational demands and limit the size of analyzed datasets. This study shows that logDet distances computed from genomic-scale sequences can efficiently recover network relationships in the level-1 ultrametric case. It applies to both unlinked site data and sequence data.
JOURNAL OF MATHEMATICAL BIOLOGY
(2022)
Article
Biology
Elizabeth S. Allman, Hector Banos, Jonathan D. Mitchell, John A. Rhodes
Summary: Inference of species networks under the Network Multispecies Coalescent Model is limited by computational demands and the complexity of the networks. This study focuses on the tree of blobs, where non-cut edges are contracted to nodes, to infer a general species network. An identifiability theorem is established, stating that most features of the unrooted tree of blobs can be determined from the distribution of gene quartet topologies. This suggests a practical algorithm for tree of blobs inference.
JOURNAL OF MATHEMATICAL BIOLOGY
(2023)
Article
Biochemical Research Methods
Elizabeth S. Allman, Hector Banos, John A. Rhodes
Summary: As more genomic-scale datasets are being used for species tree inference, simulators of the multispecies coalescent (MSC) process are necessary to test and evaluate new inference methods. However, the simulators themselves need to be tested to ensure their validity. This study develops methods to check if a collection of gene trees aligns with the MSC model on a given species tree. The tests conducted on well-known simulators reveal flaws in some of the samples, and are implemented in the freely available R package MSCsimtester for easy application by developers and users.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2023)
Article
Biochemical Research Methods
Dakota Dragomir, Elizabeth S. Allman, John A. Rhodes
Summary: Diversification models describe the random growth of evolutionary trees to model the historical relationships of species. This study establishes the identifiability of parameters for one form of such a model, a multitype pure birth model of speciation, based on an asymptotic distribution derived from a single tree observation. The key finding is that type observations are not needed at any internal points or leaves of the tree for practical applications.
JOURNAL OF COMPUTATIONAL BIOLOGY
(2023)
Article
Biochemical Research Methods
Samaneh Yourdkhani, Elizabeth S. Allman, John A. Rhodes
Summary: The PM model for protein evolution describes sequence data with sites following multiple related substitution processes depending on different amino acid distributions. Using algebraic methods, parameters in the PM model are shown to be identifiable for empirical analyses, particularly when the tree relates 9 or more taxa and the number of profiles is less than 74.
JOURNAL OF COMPUTATIONAL BIOLOGY
(2021)
Article
Biochemical Research Methods
John A. Rhodes
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2020)
Article
Statistics & Probability
Jonathan D. Mitchell, Elizabeth S. Allman, John A. Rhodes
ELECTRONIC JOURNAL OF STATISTICS
(2019)
Article
Mathematics, Applied
Elizabeth S. Allman, Colby Long, John A. Rhodes
SIAM JOURNAL ON APPLIED ALGEBRA AND GEOMETRY
(2019)