4.8 Article

A class of identifiable phylogenetic birth-death models

出版社

NATL ACAD SCIENCES
DOI: 10.1073/pnas.2119513119

关键词

identifiability; birth-death models; phylogenetics; phylodynamics

资金

  1. NSF [DMS-2052653, DMS1646108]

向作者/读者索取更多资源

Louca and Pennell recently demonstrated that a large class of phylogenetic birth-death models is statistically unidentifiable from lineage-through-time (LTT) data, while an alternative and widely used class of birth-death models is indeed identifiable. They further show that any unidentifiable birth-death model class can be arbitrarily closely approximated by a class of identifiable models, with specific sampling requirements that are expected to be met in many contexts such as the phylodynamic analysis of a global pandemic.
In a striking result, Louca and Pennell [S. Louca, M. W. Pennell, Nature 580,502-505 (2020)1 recently proved that a large class of phylogenetic birth-death models is statistically unidentifiable from lineage-through-time (LTT) data: Any pair of sufficiently smooth birth and death rate functions is congruent to an infinite collection of other rate functions, all of which have the same likelihood for any LTT vector of any dimension. As Louca and Pennell argue, this fact has distressing implications for the thousands of studies that have utilized birth-death models to study evolution. In this paper, we qualify their finding by proving that an alternative and widely used class of birth-death models is indeed identifiable. Specifically, we show that piecewise constant birth-death models can, in principle, be consistently estimated and distinguished from one another, given a sufficiently large extant timetree and some knowledge of the present-day population. Subject to mild regularity conditions, we further show that any unidentifiable birth-death model class can be arbitrarily closely approximated by a class of identifiable models. The sampling requirements needed for our results to hold are explicit and are expected to be satisfied in many contexts such as the phylodynamic analysis of a global pandemic.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Genetics & Heredity

High-throughput inference of pairwise coalescence times identifies signals of selection and enriched disease heritability

Pier Francesco Palamara, Jonathan Terhorst, Yun S. Song, Alkes L. Price

NATURE GENETICS (2018)

Article Statistics & Probability

Efficiently Inferring the Demographic History of Many Populations With Allele Count Data

Jack Kamm, Jonathan Terhorst, Richard Durbin, Yun S. Song

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2020)

Article Genetics & Heredity

Robust detection of natural selection using a probabilistic model of tree imbalance

Enes Dilber, Jonathan Terhorst

Summary: This article presents a new likelihood-based method for detecting natural selection that is robust to confounding factors such as population expansion. The method uses a probabilistic model of tree imbalance and a frequency spectrum-based estimator to detect signals of natural selection.

GENETICS (2022)

Article Biochemistry & Molecular Biology

Variational Phylodynamic Inference Using Pandemic-scale Data

Caleb Ki, Jonathan Terhorst

Summary: This paper introduces VBSKY, a method for fitting Bayesian phylodynamic models to large pathogen genetic datasets. By combining recent advances in modeling, inference, and programming, VBSKY can analyze thousands of genomes in minutes and provide accurate estimates of epidemiologically relevant quantities.

MOLECULAR BIOLOGY AND EVOLUTION (2022)

Article Biochemistry & Molecular Biology

Direct detection of natural selection in Bronze Age Britain

Iain Mathieson, Jonathan Terhorst

Summary: This study developed a novel method for estimating time-varying selection coefficients using ancient DNA data. Applying this method to ancient and present-day human genomes from Britain, the researchers identified seven loci with significant evidence of selection in the past 4500 years, mostly related to increased vitamin D or calcium levels. The strength of selection on individual loci varied over time, suggesting the influence of cultural or environmental factors. Skin pigmentation was the only complex trait with significant evidence of polygenic selection, emphasizing the importance of phenotypes related to vitamin D.

GENOME RESEARCH (2022)

Article Ecology

Rates of convergence in the two-island and isolation-with-migration models

Brandon Legried, Jonathan Terhorst

Summary: In this paper, the statistical performance of demographic inference methods in the presence of continuous migration between populations is investigated. The theories of phase-type distributions and concentration of measure are employed to study the two-island and isolation-with-migration models, resulting in upper and lower bounds on rates of convergence for parametric estimators in migration models.

THEORETICAL POPULATION BIOLOGY (2022)

Article Biology

A linear adjustment-based approach to posterior drift in transfer learning

Subha Maity, Diptavo Dutta, Jonathan Terhorst, Yuekai Sun, Moulinath Banerjee

Summary: We propose new models and methods for the posterior drift problem, where the regression function in the target domain is modeled as a linear adjustment of that in the source domain. We study the theoretical properties of our estimators in the binary classification problem. Our approach is flexible and applicable in various statistical settings, including epidemiology, genetics, and biomedicine. We illustrate the power of our approach through mortality prediction for British Asians and overcoming spurious correlation in the Waterbirds dataset.

BIOMETRIKA (2023)

Article Statistics & Probability

Exact Decoding of a Sequentially Markov Coalescent Model in Genetics

Caleb Ki, Jonathan Terhorst

Summary: Sequentially Markov coalescent (SMC) is an important family of models in statistical genetics for approximating genetic variation data distribution under complex evolutionary models. SMC-based methods are widely used in genetics and evolutionary biology for genotype phasing and imputation, recombination rate estimation, and population history inference. In this work, a method is proposed that enables SMC-based inference in a continuous state space without the need for discretization, making it faster and more accurate than existing methods.

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2023)

Article Biology

Identifiability and inference of phylogenetic birth-death models

Brandon Legried, Jonathan Terhorst

Summary: Recent theoretical work reveals differing perspectives on the estimation of phylogenetic birth-death models with lineage-through-time data. While Louca and Pennell (2020) argue that models with continuously differentiable rate functions are nonidentifiable, Legried and Terhorst (2022) show that identifiability can be restored by considering piecewise constant rate functions. In this study, we contribute new theoretical results to this ongoing discussion. We prove that models based on piecewise polynomial rate functions, regardless of the order or number of pieces, are statistically identifiable, including spline-based models with arbitrary knots. However, we also highlight the challenge of rate function estimation, even when identifiability is achieved, by presenting information-theoretic lower bounds for hypothesis testing using birth-death models.

JOURNAL OF THEORETICAL BIOLOGY (2023)

暂无数据