4.7 Article

wQFM: highly accurate genome-scale species tree estimation from weighted quartets

Journal

BIOINFORMATICS
Volume 37, Issue 21, Pages 3734-3743

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab428

Keywords

-

Funding

  1. Information and Communication Technology Division (ICT Division), Government of the People's Republic of Bangladesh

Ask authors/readers for more resources

Estimating species trees from genes sampled from the whole genome is challenging due to gene tree-species tree discordance, with incomplete lineage sorting being a common cause. Quartet-based weighted methods offer a statistically consistent way for accurate species tree estimation in such cases. The proposed wQFM method extends the quartet FM algorithm to a weighted setting, providing highly accurate species tree estimation results on simulated and real biological datasets.
Motivation: Species tree estimation from genes sampled from throughout the whole genome is complicated due to the gene tree-species tree discordance. Incomplete lineage sorting (ILS) is one of the most frequent causes for this discordance, where alleles can coexist in populations for periods that may span several speciation events. Quartet-based summary methods for estimating species trees from a collection of gene trees are becoming popular due to their high accuracy and statistical guarantee under ILS. Generating quartets with appropriate weights, where weights correspond to the relative importance of quartets, and subsequently amalgamating the weighted quartets to infer a single coherent species tree can allow for a statistically consistent way of estimating species trees. However, handling weighted quartets is challenging. Results: We propose wQFM, a highly accurate method for species tree estimation from multi-locus data, by extending the quartet FM (QFM) algorithm to a weighted setting. wQFM was assessed on a collection of simulated and real biological datasets, including the avian phylogenomic dataset, which is one of the largest phylogenomic datasets to date. We compared wQFM with wQMC, which is the best alternate method for weighted quartet amalgamation, and with ASTRAL, which is one of the most accurate and widely used coalescent-based species tree estimation methods. Our results suggest that wQFM matches or improves upon the accuracy of wQMC and ASTRAL. Supplementary information: Supplementary data are available at Bioinformatics online.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Nuclear Science & Technology

Evaluation of neutronic safety parameters of the BAEC TRIGA Research Reactor with Wet Central Tube

Md. Saifur Rahman, M. A. Malek Soner, M. Mizanur Rahman, Md. Al Amin Hossain, M. A. Salam, M. N. A. Abdullah

ANNALS OF NUCLEAR ENERGY (2019)

Article Computer Science, Artificial Intelligence

Antigenic: An improved prediction model of protective antigens

M. Saifur Rahman, Md. Khaledur Rahman, Sanjay Saha, M. Kaykobad, M. Sohel Rahman

ARTIFICIAL INTELLIGENCE IN MEDICINE (2019)

Article Engineering, Biomedical

VFPred: A fusion of signal processing and machine learning techniques in detecting ventricular fibrillation from ECG signals

Nabil Ibtehaz, M. Saifur Rahman, M. Sohel Rahman

BIOMEDICAL SIGNAL PROCESSING AND CONTROL (2019)

Article Computer Science, Artificial Intelligence

MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation

Nabil Ibtehaz, M. Sohel Rahman

NEURAL NETWORKS (2020)

Article Biochemical Research Methods

SAINT: self-attention augmented inception-inside-inception network improves protein secondary structure prediction

Mostofa Rafid Uddin, Sazan Mahbub, M. Saifur Rahman, Md Shamsuzzoha Bayzid

BIOINFORMATICS (2020)

Article Biochemical Research Methods

CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning

Ali Haisam Muhammad Rafid, Md. Toufikuzzaman, Mohammad Saifur Rahman, M. Sohel Rahman

BMC BIOINFORMATICS (2020)

Article Biochemical Research Methods

ADACT: a tool for analysing (dis)similarity among nucleotide and protein sequences using minimal and relative absent words

Mujtahid Akon, Muntashir Akon, Mohimenul Kabir, M. Saifur Rahman, M. Sohel Rahman

Summary: This research focuses on developing an alignment-free framework for analyzing biological sequences and introduces the Alignment-free Dissimilarity Analysis & Comparison Tool (ADACT), which aims to simplify the workflow for researchers and practitioners in the field of bioinformatics.

BIOINFORMATICS (2021)

Article Radiology, Nuclear Medicine & Medical Imaging

A contour property based approach to segment nuclei in cervical cytology images

Iram Tazim Hoque, Nabil Ibtehaz, Saumitra Chakravarty, M. Saifur Rahman, M. Sohel Rahman

Summary: The proposed approach based on contour properties for segmentation of nuclei in cervical cytology pap smear images has shown superior performance in automated cervical cancer screening. The method outperforms other state-of-the-art methods in nucleus segmentation, with high precision and recall on both benchmark and private real datasets. The flexibility of the algorithm to adapt to real practical scenarios and requirements makes it a promising tool for effective detection and segmentation of nuclei.

BMC MEDICAL IMAGING (2021)

Article Geosciences, Multidisciplinary

A Machine Learning-based Approach for Groundwater Mapping

Rashed Uz Zzaman, Sara Nowreen, Irtesam Mahmud Khan, Md Rajibul Islam, Nabil Ibtehaz, M. Saifur Rahman, Anwar Zahid, Dilruba Farzana, Afroza Sharmin, M. Sohel Rahman

Summary: This study introduces a methodology utilizing machine learning to assess the suitability of groundwater extraction technologies in different regions of Bangladesh, highlighting key hydrogeological factors. The research demonstrates that the Random Forest algorithm is the optimal classification model, also identifying digital elevation model, specific yield, and lithology as the most influential factors on groundwater levels in Bangladesh.

NATURAL RESOURCES RESEARCH (2022)

Article Biology

PASTA with many application-aware optimization criteria for alignment based phylogeny inference

Muhammad Ali Nayeem, Md. Shamsuzzoha Bayzid, Naser Anjum Samudro, M. Saifur Rahman, M. Sohel Rahman

Summary: This paper introduces the application of the PASTA method in multiple sequence alignment and proposes a multi-objective framework called PMAO to improve the performance of PASTA by integrating multiple objectives related to the accuracy of the phylogenetic tree. Experimental results show that the tree-space generated by PMAO is better than using PASTA alone, and adding an additional component can generate smaller and higher quality solutions.

COMPUTATIONAL BIOLOGY AND CHEMISTRY (2022)

Article Biochemical Research Methods

MAMMLE: A Framework for Phylogeny Estimation Based on Multiobjective Application-aware Multiple Sequence Alignment and Maximum Likelihood Ensemble

Muhammad Ali Nayeem, Naser Anjum Samudro, M. Saifur Rahman, M. Sohel Rahman

Summary: In this study, a framework called MAMMLE is proposed to infer better phylogenetic trees from unaligned sequences by hybridizing two MSA tools (MUSCLE and MAFFT) with multiobjective optimization strategy and multiple maximum likelihood hypotheses. Experimental results show that MAMMLE exhibits a median improvement of 5.57% over MUSCLE in 50.34% of instances.

JOURNAL OF COMPUTATIONAL BIOLOGY (2023)

Article Medical Informatics

Toward Preparing a Knowledge Base to Explore Potential Drugs and Biomedical Entities Related to COVID-19: Automated Computational Approach

Junaed Younus Khan, Tawkat Islam Khondaker, Iram Tazim Hoque, Hamada R. H. Al-Absi, Mohammad Saifur Rahman, Reto Guler, Tanvir Alam, M. Sohel Rahman

JMIR MEDICAL INFORMATICS (2020)

Proceedings Paper Computer Science, Artificial Intelligence

A 'Phylogeny-aware' Multi-objective Optimization Approach for Computing MSA

Muhammad Ali Nayeem, Md. Shamsuzzoha Bayzid, Atif Hasan Rahman, Rifat Shahriyar, M. Sohel Rahman

PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19) (2019)

Proceedings Paper Computer Science, Information Systems

Deep Face Image Retrieval: a Comparative Study with Dictionary Learning

Ahmad S. Tarawneh, Ahmad B. Hassanat, Ceyhun Celik, Dmitry Chetverikov, M. Sohel Rahman, Chaman Verma

2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS) (2019)

No Data Available