Article
Evolutionary Biology
Bui Quang Minh, Cuong Cao Dang, Le Sy Vinh, Robert Lanfear
Summary: Amino acid substitution models are crucial in phylogenetic analyses, and a new ML method called QMaker has been proposed to estimate a general time-reversible Q matrix from large protein data sets. QMaker combines an efficient ML tree search algorithm, model selection for handling model heterogeneity among alignments, and consideration of rate mixture models among sites.
SYSTEMATIC BIOLOGY
(2021)
Article
Biochemistry & Molecular Biology
Sudip Sharma, Sudhir Kumar
Summary: The study discovers that using subsample-upsample approach can significantly reduce the computational costs in analyzing long sequence alignments in molecular evolution, while still recovering the correct optimal substitution model. An adaptive protocol called ModelTamer is proposed, which can select the optimal models in much shorter time and with much less memory usage.
MOLECULAR BIOLOGY AND EVOLUTION
(2022)
Article
Biology
Lam Si Tung Ho, Edward Susko
Summary: Likelihood-based methods are considered the best approaches for reconstructing ancestral states, but consistency issues may arise when the tree topology and edge lengths are unknown. Consistency is proven under symmetric models and a simple consistent estimator is found under non-symmetric models.
JOURNAL OF MATHEMATICAL BIOLOGY
(2022)
Article
Biochemistry & Molecular Biology
Lucas C. Wheeler, Michael J. Harms
Summary: The study suggests that ancestral proteins may have been less specific than modern proteins, but when interactions with random peptide targets are considered, the picture becomes more complex. It demonstrates that altered biological specificity does not necessarily indicate altered intrinsic specificity.
MOLECULAR BIOLOGY AND EVOLUTION
(2021)
Article
Optics
F. Benatti, S. Olivares, G. Perosa, D. Bajoni, S. Di Mitri, R. Floreanini, L. Ratti, F. Parmigiani
Summary: The study proposes a method based on maximum likelihood techniques to reconstruct the energy state occupation number distribution of FEL radiation, addressing the photo-counting issues at high intensities. In addition to focusing on the statistical features of FEL radiation, the proposal is also applicable to the study of general nonlinear optical processes regarding the preservation of quantum features.
Article
Evolutionary Biology
Cuong Cao Dang, Bui Quang Minh, Hanon McShea, Joanna Masel, Jennifer Eleanor James, Le Sy Vinh, Robert Lanfear
Summary: This study introduces a new maximum likelihood method, nQMaker, that can estimate time nonreversible amino acid substitution models and rooted phylogenetic trees. The results show that the nonreversible models estimated with nQMaker are a better fit to empirical alignments than pre-existing reversible models, and the improvements in model fit scale with the size of the data set.
SYSTEMATIC BIOLOGY
(2022)
Article
Ecology
Mariana P. Braga, Niklas Janz, Soren Nylin, Fredrik Ronquist, Michael J. Landis
Summary: The study reveals that as pierids gained new hosts and re-colonized ancestral hosts, it promoted a phase transition in network structure. Combining network analysis with Bayesian inference of host-repertoire evolution proves effective in understanding changes in complex species interactions over time.
Review
Biochemistry & Molecular Biology
Federico Scossa, Alisdair R. Fernie
Summary: Ancestral proteins were more promiscuous than modern ones, with specificity often evolving after gene duplication; some modern proteins are found to have evolved de novo from ancestors lacking those functions; new interactions evolved from just a few mutations, suggesting acquisition of new functions is not difficult but often lost to subsequent mutations.
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL
(2021)
Article
Mathematics, Interdisciplinary Applications
Changming Yin, Mingyao Ai, Xia Chen, Xiangshun Kong
Summary: This paper investigates empirical likelihood inference for fixed design generalized linear models with longitudinal data, which are used to model discrete or nonnegative responses. The consistency and asymptotic normality of the maximum empirical likelihood estimator are established under mild conditions, and the asymptotic chi(2) distribution of the empirical log-likelihood ratio is also obtained. Compared with existing results, the new conditions are weaker and easier to verify. Simulations are presented to illustrate these asymptotic properties.
JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY
(2023)
Article
Biochemistry & Molecular Biology
Kona N. N. Orlandi, Sophia R. R. Phillips, Zachary R. R. Sailer, Joseph L. L. Harman, Michael J. J. Harms
Summary: Ancestral sequence reconstruction (ASR) is a powerful tool for studying protein evolution. The topiary software pipeline simplifies the process and provides detailed results.
Article
Biochemistry & Molecular Biology
Julia Haag, Dimitri Hoehler, Ben Bettisworth, Alexandros Stamatakis
Summary: This study introduces a method to predict the level of difficulty in phylogenetic analysis datasets and presents a tool for accurate prediction. The tool can increase user awareness of signal and uncertainty in phylogenetic analysis and assist in selecting appropriate analysis setups and search algorithms.
MOLECULAR BIOLOGY AND EVOLUTION
(2022)
Article
Biochemistry & Molecular Biology
Matthew A. Spence, Joe A. Kaczmarski, Jake W. Saunders, Colin J. Jackson
Summary: Ancestral sequence reconstruction (ASR) is valuable in both the study of molecular evolution and protein engineering, as proteins generated by ASR often exhibit improved properties that are valued by protein engineers. By comparing extant proteins with evolutionary intermediates generated by ASR, protein engineers can identify substitutions that have contributed to functional innovation within protein families. Understanding the applications, limitations, and recent developments of ASR is crucial as it becomes more widely adopted in protein engineering.
CURRENT OPINION IN STRUCTURAL BIOLOGY
(2021)
Article
Biochemistry & Molecular Biology
Weixiang Fang, Claire M. Bell, Abel Sapirstein, Soichiro Asami, Kathleen Leeper, Donald J. Zack, Hongkai Ji, Reza Kalhor
Summary: This study introduces a method called quantitative fate mapping, which reconstructs the hierarchy, commitment times, population sizes, and commitment biases of intermediate progenitor states based on a time-scaled phylogeny of their descendants. The validation of these methods using realistic experiments provides insights into analyzing progenitor fate and dynamics.
Article
Management
Gerardo Berbeglia, Agustin Garassino, Gustavo Vulcano
Summary: This paper discusses the importance of choice-based demand estimation in retail operations and revenue management, as well as the application of different demand models and estimation algorithms. Through extensive experimental studies, comparative statistics on predictive power and revenue performance of choice models are provided, along with recommendations for model implementation in different operational environments.
MANAGEMENT SCIENCE
(2022)
Article
Statistics & Probability
Jean Feng, William S. DeWitt, Aaron McKenna, Noah Simon, Amy D. Willis, Frederick A. Matsen
Summary: CRISPR technology enables cell lineage tracing in complex multicellular organisms by using insertion-deletion mutations of synthetic genomic barcodes. Researchers have proposed a statistical model and developed a procedure to estimate tree topology, branch lengths, and mutation parameters. Their method infers relative ordering across parallel lineages, offering advantages over existing techniques.
ANNALS OF APPLIED STATISTICS
(2021)
Article
Evolutionary Biology
Claudia C. Weber, Umberto Perron, Dearbhaile Casey, Ziheng Yang, Nick Goldman
Summary: This study discusses how to accurately estimate parameters related to protein evolution by handling missing data, and demonstrates that combining ambiguous-coded and fully resolved data inputs can improve accuracy. By establishing connections between observed information in different state spaces, evolutionary information can be successfully recovered from sequences that were previously inaccessible.
SYSTEMATIC BIOLOGY
(2021)
Article
Biochemical Research Methods
Antanas Kalkauskas, Umberto Perron, Yuxuan Sun, Nick Goldman, Guy Baele, Stephane Guindon, Nicola De Maio
Summary: The author explores the effects of different model assumptions on phylogeographic inference and discovers that sample collection biases can strongly impact the quality of reconstruction. They suggest various strategies to counter these effects, but note that they come with additional computational burden. Additionally, they investigate the differences of various phylogeographic models and their suitability in different scenarios.
PLOS COMPUTATIONAL BIOLOGY
(2021)
Editorial Material
Multidisciplinary Sciences
Emma B. Hodcroft, Nicola De Maio, Rob Lanfear, Duncan R. MacCannell, Bui Quang Minh, Heiko A. Schmidt, Alexandros Stamatakis, Nick Goldman, Christophe Dessimoz
Summary: Researchers are in need of new approaches to control the pandemic as existing tools, rules, and incentives are struggling to cope with the flood of coronavirus genome sequences.
Article
Genetics & Heredity
Conor R. Walker, Aylwyn Scally, Nicola De Maio, Nick Goldman
Summary: Many complex genomic rearrangements arise through template switch errors during DNA replication. By using an improved statistical approach, it has been shown that template switch events have been widespread in the evolution of great apes' genomes and provide a parsimonious explanation for the presence of many complex mutation clusters in their phylogenetic context. Larger-scale mechanisms of genome rearrangement involve structural features around breakpoints, with atypical patterns of secondary structure formation and DNA bending present at the initial template switch loci.
Article
Evolutionary Biology
Nicola De Maio, Conor R. Walker, Yatish Turakhia, Robert Lanfear, Russell Corbett-Detig, Nick Goldman
Summary: The COVID-19 pandemic has prompted an unprecedented response from the sequencing community, leading to a study of mutation rates and selective pressures using sequence data from over 140,000 SARS-CoV-2 genomes. Two specific mutation rates, G -> U and C -> U, were found to be significantly higher than others, possibly attributed to APOBEC and ROS activity. Genomic context does have an effect on mutation rates, but its impact is limited.
GENOME BIOLOGY AND EVOLUTION
(2021)
Article
Evolutionary Biology
Emily Jane Mctavish, Luna Luisa Sanchez-Reyes, Mark T. Holder
Summary: The Open Tree of Life project aims to create a comprehensive and digitally available tree of life by synthesizing published phylogenetic trees and taxonomic data, with APIs provided for easy access. The Python package opentree offers a user-friendly wrapper for these APIs and provides scripts and tutorials for data analysis. This tool has been used to estimate phylogenetic relationships for bird families and taxa observed at a specific natural reserve.
SYSTEMATIC BIOLOGY
(2021)
Article
Biochemical Research Methods
Jeet Sukumaran, Mark T. Holder, L. Lacey Knowles
Summary: The traditional multispecies coalescent (MSC) model fails to distinguish genetic structures between species and within species, leading to the emergence of artifactual species under high-resolution data. The new species delimitation approach explicitly models speciation as an extended process, allowing for more accurate discrimination between genetic structures corresponding to species lineages and population lineages within species, providing insights into the relationship between population and species-level processes.
PLOS COMPUTATIONAL BIOLOGY
(2021)
Article
Biochemical Research Methods
Nicola De Maio, Alexander Alekseyenko, William J. Coleman-Smith, Fabio Pardi, Marc A. Suchard, Asif U. Tamuri, Jakub Truszkowski, Nick Goldman
Summary: This study introduced a novel method called "phylogenetic novelty scores" to address sequence weighting in bioinformatics, formalizing the evolutionary novelty of a sequence within an alignment. The method showed promising results in computational efficiency and accuracy improvement in sequence alignment.
BMC BIOINFORMATICS
(2021)
Article
Biochemistry & Molecular Biology
Jakob McBroome, Bryan Thornlow, Angie S. Hinrichs, Alexander Kramer, Nicola De Maio, Nick Goldman, David Haussler, Russell Corbett-Detig, Yatish Turakhia
Summary: A database of SARS-CoV-2 phylogenetic trees inferred with public sequences is presented, updated daily to include new sequences and encoded in MAT format. The researchers also introduce matUtils software for querying and manipulating the MATs efficiently.
MOLECULAR BIOLOGY AND EVOLUTION
(2021)
Article
Multidisciplinary Sciences
Harald S. Vohringer, Theo Sanderson, Matthew Sinnott, Nicola De Maio, Thuy Nguyen, Richard Goater, Frank Schwach, Ian Harrison, Joel HeHowells, Cristina Ariani, Sonia Goncalves, David K. Jackson, Ian Johnstone, Alexander W. Jung, Callum Saint, John Sillitoe, Maria Suciu, Nick Goldman, Jasmine Panovska-Griffiths, Ewan Birney, Erik Volz, Sebastian Funk, Dominic Kwiatkowski, Meera Chand, Inigo Martincorena, Jeffrey C. Barrett, Moritz Gerstung
Summary: The study analyzed the dynamics of different lineages in English local authorities using real-time genomic data. The findings showed significant fluctuations in transmissibility and proportions of different variants over time, with Delta variant rapidly increasing in early summer 2021.
Article
Biochemical Research Methods
Nicola R. De Maio, William Boulton, Lukas Weilguny, Conor Walker, Yatish Turakhia, Russell O. Corbett-Detig, Nick Goldman, Ville Mustonen, Joel Wertheim
Summary: This article introduces a new algorithm and software for efficiently simulating a large number of closely related genomes. The algorithm is based on the Gillespie approach and utilizes an efficient multi-layered search tree structure to achieve high computational efficiency, allowing integration with various evolutionary models.
PLOS COMPUTATIONAL BIOLOGY
(2022)
Correction
Multidisciplinary Sciences
Harald S. Vohringer, Theo Sanderson, Matthew Sinnott, Nicola De Maio, Thuy Nguyen, Richard Goater, Frank Schwach, Ian Harrison, Joel Hellewell, Cristina V. Ariani, Sonia Goncalves, David K. Jackson, Ian Johnston, Alexander W. Jung, Callum Saint, John Sillitoe, Maria Suciu, Nick Goldman
Article
Biotechnology & Applied Microbiology
Lukas Weilguny, Nicola De Maio, Rory Munro, Charlotte Manser, Ewan Birney, Matthew Loose, Nick Goldman
Summary: BOSS-RUNS is an algorithmic framework and software that dynamically updates decision strategies based on real-time updates of uncertainty at each genome position. It optimizes information gain by deciding whether to fully sequence each DNA fragment, leading to improved variant calling in microbial communities.
NATURE BIOTECHNOLOGY
(2023)
Article
Genetics & Heredity
Nicola De Maio, Prabhav Kalaghatgi, Yatish Turakhia, Russell Corbett-Detig, Bui Quang Minh, Nick Goldman
Summary: Phylogenetics plays a crucial role in genomic epidemiology, and the COVID-19 pandemic has generated an unprecedented amount of genome sequence data for analysis. However, most phylogenetic approaches are unable to handle the scale of these datasets. This study presents a new method called 'MAximum Parsimonious Likelihood Estimation' (MAPLE) for likelihood-based phylogenetic analysis of large genomic datasets. MAPLE is faster, more accurate, and requires significantly less memory compared to existing maximum likelihood methods, enabling the analysis of millions of genomes.
Article
Evolutionary Biology
Paschalia Kapli, Ioanna Kotari, Maximilian J. Telford, Nick Goldman, Ziheng Yang
Summary: Inference of deep phylogenies has primarily used protein sequences, but our analysis shows that DNA sequences may be just as useful and should not be excluded. We conducted a simulation study and analyzed empirical data, which suggest that DNA sequences can recover the correct tree as often as protein sequences. Using DNA data has computational advantages and allows for advanced models that account for heterogeneity in the nucleotide-substitution process.
SYSTEMATIC BIOLOGY
(2023)