4.8 Article

Alignment Errors Strongly Impact Likelihood-Based Tests for Comparing Topologies

期刊

MOLECULAR BIOLOGY AND EVOLUTION
卷 31, 期 11, 页码 3057-3067

出版社

OXFORD UNIV PRESS
DOI: 10.1093/molbev/msu231

关键词

alignment; alignment uncertainty; KH test; SOWH test; phylogeny; likelihood; tree comparisons; branch length optimization

资金

  1. Israeli Science Foundation (ISF) [1092/13]
  2. Natural Sciences and Engineering Research Council of Canada

向作者/读者索取更多资源

Estimating phylogenetic trees from sequence data is an extremely challenging and important statistical task. Within the maximum-likelihood paradigm, the best tree is a point estimate. To determine how strongly the data support such an evolutionary scenario, a hypothesis testing methodology is required. To this end, the Kishino-Hasegawa (KH) test was developed to determine whether one topology is significantly more supported by the sequence data than another one. This test and its derivatives are widely used in phylogenetics and phylogenomics. Here, we show that the KH test is biased in the presence of alignment error and can lead to erroneous conclusions. Using simulations we demonstrated that due to alignment errors the KH test often rejects one of the competing topologies, even though both topologies are equally supported by the data. Specifically, we show that the KH test favors the guide tree used to align the analyzed sequences. Further, branch length optimization renders the test too conservative. We propose two possible corrections for these biases. First, we evaluated the impact of removing unreliable alignment columns and found out that it decreases the bias at the cost of substantially reducing the test's power. Second, we developed a parametric test that entirely abolishes the biases without data filtering. This test incorporates the alignment construction step into the test's hypothesis, thus removing the above guide tree effect. We extend this methodology for the case of multiple-topology comparisons and demonstrate the applicability of the new methodology on an exemplary data set.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Evolutionary Biology

A Codon Model for Associating Phenotypic Traits with Altered Selective Patterns of Sequence Evolution

Keren Halabi, Eli Levy Karin, Laurent Gueguen, Itay Mayrose

Summary: The article introduces a new phylogenetic model called TraitRELAX, which can accurately detect changes in selection patterns of protein-coding genes, overcoming strong assumptions about the evolution of traits in previous methods.

SYSTEMATIC BIOLOGY (2021)

Article Biochemical Research Methods

Pasa: Proteomic analysis of serum antibodies web server

Oren Avram, Aya Kigel, Anna Vaisman-Mentesh, Sharon Kligsberg, Shai Rosenstein, Yael Dror, Tal Pupko, Yariv Wine

Summary: The proteomics of serum antibodies (Ig-Seq) combines BCR-Seq and high-resolution mass-spectrometry, providing a comprehensive characterization of the humoral response. The PASA web server offers a robust computational platform for analyzing and integrating data from proteomics of serum antibodies, making it accessible to non-expert users.

PLOS COMPUTATIONAL BIOLOGY (2021)

Article Multidisciplinary Sciences

Type III secretion system effectors form robust and flexible intracellular virulence networks

David Ruano-Gallego, Julia Sanchez-Garrido, Zuzanna Kozik, Elena Nunez-Berrueco, Massiel Cepeda-Molero, Caroline Mullineaux-Sanders, Jasmine Naemi-Baghshomali Clark, Sabrina L. Slater, Naama Wagner, Izabela Glegola-Madejska, Theodoros Roumeliotis, Tal Pupko, Luis Angel Fernandez, Alfonso Rodriguez-Paton, Jyoti S. Choudhary, Gad Frankel

Summary: The study demonstrates the extreme robustness of both T3SS effector networks and host responses, as pathogenicity can be maintained even with a 60% contraction in the effector network. Different effector networks induce varying colonic cytokine profiles, yet all can induce protective immunity, implicating the importance of effector networks in host adaptation.

SCIENCE (2021)

Article Immunology

Domain-Scan: Combinatorial Sero-Diagnosis of Infectious Diseases Using Machine Learning

Smadar Hada-Neeman, Yael Weiss-Ottolenghi, Naama Wagner, Oren Avram, Haim Ashkenazy, Yaakov Maor, Ella H. Sklan, Dmitry Shcherbakov, Tal Pupko, Jonathan M. Gershoni

Summary: The study utilizes phage-display epitope arrays for sero-diagnosis, measuring serum binding to multiple epitopes using Next-generation sequencing (NGS). Machine learning classification distinguishes healthy individuals from those infected with HIV-1 or HCV, accurately identifying the contributing domains.

FRONTIERS IN IMMUNOLOGY (2021)

Article Biochemical Research Methods

SpacePHARER: sensitive identification of phages from CRISPR spacers in prokaryotic hosts

Ruoshi Zhang, Milot Mirdita, Eli Levy Karin, Clovis Norroy, Clovis Galiez, Johannes Soeding

Summary: SpacePHARER is a sensitive and fast tool for predicting phage-host relationships by comparing spacers and phages at the protein level, optimizing scores for matching short sequences, and combining evidence from multiple matches.

BIOINFORMATICS (2021)

Article Biochemical Research Methods

Fast and sensitive taxonomic assignment to metagenomic contigs

M. Mirdita, M. Steinegger, F. Breitwieser, J. Soeding, E. Levy Karin

Summary: MMseqs2 taxonomy is a new tool for assigning taxonomic labels to metagenomic contigs. It extracts protein fragments from each contig, retains those relevant for taxonomic annotation, and determines the taxonomic identity using weighted voting. MMseqs2 is 2-18 times faster than existing tools and includes modules for creating and manipulating taxonomic reference databases.

BIOINFORMATICS (2021)

Article Multidisciplinary Sciences

A phage mechanism for selective nicking of dUMP-containing DNA

Tridib Mahata, Shahar Molshanski-Mor, Moran G. Goren, Biswanath Jana, Miriam Kohen-Manor, Ido Yosef, Oren Avram, Tal Pupko, Dor Salomon, Udi Qimron

Summary: Bacteriophages have evolved efficient means to take over the machinery of the bacterial host. Through the study of a bacterial growth inhibitor gene product T5.015 encoded by the T5 phage, it was found that growth inhibition mediated by T5.015 depends on the uracil-excision activity of Ung, leading to DNA replication and cell division arrest.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2021)

Article Biochemical Research Methods

Effectidor: an automated machine-learning-based web server for the prediction of type-III secretion system effectors

Naama Wagner, Oren Avram, Dafna Gold-Binshtok, Ben Zerah, Doron Teper, Tal Pupko

Summary: Effectidor is a user-friendly web server that utilizes multiple machine-learning techniques to predict T3Es within bacterial genomes and performs well in terms of accuracy.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

A LASSO-based approach to sample sites for phylogenetic tree search

Noa Ecker, Dana Azouri, Ben Bettisworth, Alexandros Stamatakis, Yishay Mansour, Itay Mayrose, Tal Pupko

Summary: In large-scale sequence alignment for phylogenetic reconstruction, we propose an artificial intelligence-based method that selects the optimal subset of sites and computes the log-likelihood of the entire data based on this subset by constraining the number of sites used. We show that computing the likelihood based on 5% of the sites already provides an accurate approximation of the tree likelihood based on the entire data. Furthermore, using this Lasso-based approximation significantly reduces the running time during a tree search while maintaining the same tree-search performance.

BIOINFORMATICS (2022)

Article Biochemistry & Molecular Biology

An Approximate Bayesian Computation Approach for Modeling Genome Rearrangements

Asher Moshe, Elya Wygoda, Noa Ecker, Gil Loewenthal, Oren Avram, Omer Israeli, Einat Hazkani-Covo, Itsik Pe'er, Tal Pupko

Summary: This study developed a probabilistic approach to infer genome rearrangement rate parameters and used an Approximate Bayesian Computation framework for inference. The method can help elucidate the role of genome rearrangement in evolution and simulate genomes with empirical dynamics.

MOLECULAR BIOLOGY AND EVOLUTION (2022)

Article Biochemistry & Molecular Biology

Using evolutionary data to make sense of macromolecules with a face-lifted ConSurf

Barak Yariv, Elon Yariv, Amit Kessel, Gal Masrati, Adi Ben Chorin, Eric Martz, Itay Mayrose, Tal Pupko, Nir Ben-Tal

Summary: The ConSurf web-server is used for analyzing proteins, RNA, and DNA, and provides a quick and accurate estimation of per-site evolutionary rate among homologues. It has a user-friendly interface and improved visualization of results. By analyzing a set of homologous sequences, ConSurf calculates evolutionary rates using hidden Markov model-based search tools and assembles a representative set of effective homologues for informative analysis. The availability of AlphaFold model structures makes ConSurf particularly relevant to the research community. Python re-implementation of the computational pipeline and standalone version download are significant improvements.

PROTEIN SCIENCE (2023)

Article Biochemistry & Molecular Biology

The evolutionary dynamics that retain long neutral genomic sequences in face of indel deletion bias: a model and its application to human introns

Gil Loewenthal, Elya Wygoda, Natan Nagar, Lior Glick, Itay Mayrose, Tal Pupko

Summary: The article discusses the evolutionary events of insertions and deletions of short DNA segments, proposes the phenomenon of border-induced selection, and develops corresponding dynamic models to explore the topic.

OPEN BIOLOGY (2022)

Article Biochemistry & Molecular Biology

GenomeFLTR: filtering reads made easy

Edo Dotan, Michael Alburquerque, Elya Wygoda, Dorothee Huchon, Tal Pupko

Summary: In the last decade, advances in sequencing technology have resulted in a significant increase in genomic data, transforming our understanding of gene and genome evolution and function. However, identifying contaminated reads remains challenging. To address this, GenomeFLTR is introduced as a web server that filters contaminated reads by comparing them against relevant sequence databases, allowing users to investigate the source and frequency of contamination and generate a contamination-free file.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

EvoRator2: Predicting Site-specific Amino Acid Substitutions Based on Protein Structural Information Using Deep Learning

Natan Nagar, Jerome Tubiana, Gil Loewenthal, Haim J. Wolfson, Nir Ben Tal, Tal Pupko

Summary: MSAs are important tools in molecular evolution and structural biology research, allowing inference of tolerated amino acids at each site during protein evolution. EvoRator2, a deep-learning algorithm trained on protein structures, can predict tolerated amino acids at any given site based on protein structural information. It shows satisfying results for position-weighted scoring matrices (PSSM) prediction and near state-of-the-art performance in predicting mutation effects in deep mutation scanning (DMS) experiments.

JOURNAL OF MOLECULAR BIOLOGY (2023)

Article Genetics & Heredity

Machine-learning of complex evolutionary signals improves classification of SNVs

Sapir Labes, Doron Stupp, Naama Wagner, Idit Bloch, Michal Lotem, Ephrat L. Lahad, Paz Polak, Tal Pupko, Yuval Tabach

Summary: This study analyzed the association between complex conservation patterns and the pathogenicity of Single-Nucleotide Variants (SNVs). The results showed that conservation is not always accurate and its effectiveness depends on the species and genes being analyzed. The findings led to the development of a new approach called EvoDiagnostics, which outperforms traditional conservation algorithms in predicting variant pathogenicity.

NAR GENOMICS AND BIOINFORMATICS (2022)

暂无数据