4.7 Article

Tree-sequence recording in SLiM opens new horizons for forward-time simulation of whole genomes

期刊

MOLECULAR ECOLOGY RESOURCES
卷 19, 期 2, 页码 552-566

出版社

WILEY
DOI: 10.1111/1755-0998.12968

关键词

background selection; coalescent; genealogical history; pedigree recording; selective sweeps; tree sequences

资金

  1. National Science Foundation [DBI-1262645]
  2. Wellcome Trust [100956/Z/13/Z]
  3. Alfred P. Sloan Foundation
  4. National Institutes of Health [R01GM127418, R21AI130635]
  5. College of Agriculture and Life Sciences, Cornell University
  6. Predator Free 2050 [SS/05/01]

向作者/读者索取更多资源

There is an increasing demand for evolutionary models to incorporate relatively realistic dynamics, ranging from selection at many genomic sites to complex demography, population structure, and ecological interactions. Such models can generally be implemented as individual-based forward simulations, but the large computational overhead of these models often makes simulation of whole chromosome sequences in large populations infeasible. This situation presents an important obstacle to the field that requires conceptual advances to overcome. The recently developed tree-sequence recording method (Kelleher, Thornton, Ashander, & Ralph, 2018), which stores the genealogical history of all genomes in the simulated population, could provide such an advance. This method has several benefits: (1) it allows neutral mutations to be omitted entirely from forward-time simulations and added later, thereby dramatically improving computational efficiency; (2) it allows neutral burn-in to be constructed extremely efficiently after the fact, using recapitation; (3) it allows direct examination and analysis of the genealogical trees along the genome; and (4) it provides a compact representation of a population's genealogy that can be analysed in Python using the msprime package. We have implemented the tree-sequence recording method in SLiM 3 (a free, open-source evolutionary simulation software package) and extended it to allow the recording of non-neutral mutations, greatly broadening the utility of this method. To demonstrate the versatility and performance of this approach, we showcase several practical applications that would have been beyond the reach of previously existing methods, opening up new horizons for the modelling and exploration of evolutionary processes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Evolutionary Biology

Seeing the forest for the trees: Assessing genetic offset predictions from gradient forest

Aki Jarl Laruson, Matthew C. Fitzpatrick, Stephen R. Keller, Benjamin C. Haller, Katie E. Lotterhos

Summary: This study explores the relationship between GF Offset and fitness in the Gradient Forest algorithm, and finds that GF Offset is correlated with fitness offsets under both single locus and polygenic architectures. However, neutral demography, genomic architecture, and the nature of the adaptive environment can confound this relationship.

EVOLUTIONARY APPLICATIONS (2022)

Article Biology

Migration restores hybrid incompatibility driven by mitochondrial-nuclear sexual conflict

Manisha Munasinghe, Benjamin C. Haller, Andrew G. Clark

Summary: In this study, the consequences of sexually antagonistic mitochondrial-nuclear interactions in a subdivided population were investigated using computer simulations. Disrupting these interactions resulted in less-fit males, but the strength of these interactions was not enough to drive population isolation.

PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES (2022)

Article Multidisciplinary Sciences

A unified genealogy of modern and ancient genomes

Anthony Wilder Wohns, Yan Wong, Ben Jeffery, Ali Akbari, Swapan Mallick, Ron Pinhasi, Nick Patterson, David Reich, Jerome Kelleher, Gil McVean

Summary: The sequencing of modern and ancient genomes from around the world has revolutionized our understanding of human history and evolution. Although the problem of characterizing ancestral relationships from genomic variation remains unsolved, nonparametric methods have been used successfully to infer a unified genealogy of modern and ancient humans, identify descendants of ancient samples, and estimate geographical location of ancestors.

SCIENCE (2022)

Article Biochemical Research Methods

Bayesian inference of ancestral recombination graphs

Ali Mahmoudi, Jere Koskela, Jerome Kelleher, Yao-ban Chan, David Balding

Summary: This article presents a novel algorithm, ARGinfer, for probabilistic inference of the Ancestral Recombination Graph under the Coalescent with Recombination. The algorithm uses the Succinct Tree Sequence data structure and accurately estimates evolutionary history properties of the sample, providing interpretable uncertainty assessments through posterior probability distributions.

PLOS COMPUTATIONAL BIOLOGY (2022)

Article Genetics & Heredity

Demes: a standard format for demographic models

Graham Gower, Aaron P. Ragsdale, Gertjan Bisschop, Ryan N. Gutenkunst, Matthew Hartfield, Ekaterina Noskova, Stephan Schiffels, Travis J. Struck, Jerome Kelleher, Kevin R. Thornton

Summary: Understanding the demographic history of populations is crucial in population genetics, but the lack of a standardized format to define population dynamic models hampers progress in the field. Therefore, we propose the Demes data model and file format to address these issues.

GENETICS (2022)

Article Ecology

SLiM 4: Multispecies Eco-Evolutionary Modeling

Benjamin C. Haller, Philipp W. Messer

Summary: The SLiM software framework, widely used in population genetics, has been restricted to modeling only a single species, limiting its broader application in evolutionary biology. The lack of a general-purpose, flexible modeling framework that supports simulating multiple species with explicit genetics and continuous space has hindered our ability to model higher biological levels, such as communities, ecosystems, coevolutionary and eco-evolutionary processes, and biodiversity. The release of SLiM 4 addresses this significant gap by adding support for multiple species and ecological interactions, and provides examples to showcase its new features.

AMERICAN NATURALIST (2023)

Article Evolutionary Biology

Automatic Differentiation is no Panacea for Phylogenetic Gradient Computation

Mathieu Fourment, Christiaan J. Swanepoel, Jared G. Galloway, Xiang Ji, Karthik Gangavarapu, Marc A. Suchard, Frederick A. Matsen

Summary: Gradients of probabilistic model likelihoods are crucial for computational statistics and machine learning. General-purpose machine-learning libraries like TensorFlow and PyTorch offer automatic differentiation for arbitrary models. However, for phylogenetic cases, these libraries may be slower compared to specialized code. This paper compares six gradient implementations and finds that automatic differentiation is slower than carefully implemented methods. A mixed approach combining phylogenetic libraries and machine learning libraries is recommended for optimal speed and model flexibility.

GENOME BIOLOGY AND EVOLUTION (2023)

Article Biochemistry & Molecular Biology

Inference of the distribution of fitness effects of mutations is affected by single nucleotide polymorphism filtering methods, sample size and population structure

Bea Angelica Andersson, Wei Zhao, Benjamin C. Haller, Ake Braennstrom, Xiao-Ru Wang

Summary: The distribution of fitness effects (DFE) of new mutations has been a topic of interest for evolutionary biologists. However, little is known about how data processing, sample size, and population structure impact the accuracy of DFE inference. This study demonstrates that the choice of missing-data treatment, sample size, SNP quantity, and population structure can affect DFE estimation accuracy and variance. Downsampling proves to be the most effective method, while small samples and limited SNPs can lead to unreliable DFE estimates. Moreover, population structure may bias the inferred DFE towards more deleterious mutations.

MOLECULAR ECOLOGY RESOURCES (2023)

Article Multidisciplinary Sciences

On the genes, genealogies, and geographies of Quebec

Luke Anderson-Trocme, Dominic Nelson, Shadi Zabad, Alex Diaz-Papkovich, Ivan Kryukov, Nikolas Baya, Mathilde Touvier, Ben Jeffery, Christian Dina, Helene Vezina, Jerome Kelleher, Simon Gravel

Summary: Population genetic models provide coarse representations of real-world ancestry, but this study used a large pedigree and genotype data to finely model and trace French Canadian ancestry. The loss of ancestral population structure and the emergence of spatial and regional structure highlights various population expansion models. Migration, genetic, and genealogical patterns were found within river networks in different regions of Quebec. The study also provides a simulated whole-genome sequence dataset for investigating population genetics at a high resolution.

SCIENCE (2023)

Article Biology

Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

M. Elise Lauterbur, Maria Izabel A. Cavassim, Ariella L. Gladstein, Graham Gower, Nathaniel S. Pope, Georgia Tsambos, Jeffrey Adrion, Saurabh Belsare, Arjun Biddanda, Victoria Caudill, Jean Cury, Ignacio Echevarria, Benjamin C. Haller, Ahmed R. Hasan, Xin Huang, Leonardo Nicola Martin Iasi, Ekaterina Noskova, Jana Obsteter, Vitor Antonio Correa Pavinato, Alice Pearson, David Peede, Manolo F. Perez, Murillo F. Rodrigues, Chris C. R. Smith, Jeffrey P. Spence, Anastasia Teterina, Silas Tittes, Per Unneberg, Juan Manuel Vazquez, Ryan K. Waples, Anthony Wilder Wohns, Yan Wong, Franz Baumdicker, Reed A. Cartwright, Gregor Gorjanc, Ryan N. Gutenkunst, Jerome Kelleher, Andrew D. Kern, Aaron P. Ragsdale, Peter L. Ralph, Daniel R. Schrider, Ilan Gronau

Summary: Simulation is crucial for population genetics research, but it remains a challenge to produce simulations that accurately represent genomic datasets. The development of more realistic simulations has become possible with advances in genetic data and simulation software. However, it still requires significant time and specialized knowledge.
Article Genetics & Heredity

Dispersal inference from population genetic variation using a convolutional neural network

Chris C. R. Smith, Silas Tittes, Peter L. Ralph, Andrew D. Kern

Summary: The geographic nature of biological dispersal shapes genetic variation patterns, allowing the estimation of dispersal properties from genetic data. This study presents a deep learning approach called disperseNN, which utilizes geographically distributed genotype data and convolutional neural network to estimate the mean per-generation dispersal distance. Through extensive simulations, disperseNN is shown to outperform or be competitive with existing methods, especially for small sample sizes. It also proves effective in estimating dispersal distance when other model parameters are unknown, without relying on local population density or accurate inference of identity-by-descent tracts.

GENETICS (2023)

Article Ecology

Incorporating ecology into gene drive modelling

Jaehee Kim, Keith D. Harris, Isabel K. Kim, Shahar Shemesh, Philipp W. Messer, Gili Greenbaum

Summary: Gene drive technology is a promising tool in fighting vector-borne diseases, agricultural pests, and invasive species. It is important to incorporate ecological features into gene drive models and evaluate its dynamics, potential outcomes, and risks realistically.

ECOLOGY LETTERS (2023)

Article Multidisciplinary Sciences

Elevated binding and functional antibody responses to SARS-CoV-2 in infants versus mothers

Caitlin I. Stoddard, Kevin Sung, Zak A. Yaffe, Haidyn Weight, Guillaume Beaudoin-Bussieres, Jared Galloway, Soren Gantt, Judith Adhiambo, Emily R. Begnel, Ednah Ojee, Jennifer Slyker, Dalton Wamalwa, John Kinuthia, Andres Finzi, Frederick A. Matsen IV, Dara A. Lehman, Julie Overbaugh

Summary: Limited data is available on the antibody response to SARS-CoV-2 in infants compared to their mothers. This study found that infants have distinct antibody profiles, including elevated levels of antibody binding to Spike and elevated ADCC, as well as convergent antibody binding escape profiles in the Spike fusion peptide. These findings suggest that infants develop different antibody responses to viral infection compared to adults.

NATURE COMMUNICATIONS (2023)

Article Biochemical Research Methods

phippery: a software suite for PhIP-Seq data analysis

Jared G. Galloway, Kevin Sung, Samuel S. Minot, Meghan E. Garrett, Caitlin Stoddard, Alexandra C. Willcox, Zak A. Yaffe, Ryan Yucha, Julie Overbaugh, Frederick A. Matsen

Summary: We present the phippery software suite, which consists of a Nextflow pipeline, a Python API, and a Streamlit application, for analyzing data from PhIP-Seq methods. It enables processing of raw sequencing data, calculation of enrichment, and visualization of data as a heatmap.

BIOINFORMATICS (2023)

暂无数据