4.6 Article

Assessment of Algorithms for Inferring Positional Weight Matrix Motifs of Transcription Factor Binding Sites Using Protein Binding Microarray Data

Journal

PLOS ONE
Volume 7, Issue 9, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0046145

Keywords

-

Funding

  1. European Community [HEALTH-F4-2009-223575]
  2. Israel Science Foundation [802/08]
  3. Edmond J. Safra Center for Bioinformatics at Tel Aviv University

Ask authors/readers for more resources

The new technology of protein binding microarrays (PBMs) allows simultaneous measurement of the binding intensities of a transcription factor to tens of thousands of synthetic double-stranded DNA probes, covering all possible 10-mers. A key computational challenge is inferring the binding motif from these data. We present a systematic comparison of four methods developed specifically for reconstructing a binding site motif represented as a positional weight matrix from PBM data. The reconstructed motifs were evaluated in terms of three criteria: concordance with reference motifs from the literature and ability to predict in vivo and in vitro bindings. The evaluation encompassed over 200 transcription factors and some 300 assays. The results show a tradeoff between how the methods perform according to the different criteria, and a dichotomy of method types. Algorithms that construct motifs with low information content predict PBM probe ranking more faithfully, while methods that produce highly informative motifs match reference motifs better. Interestingly, in predicting high-affinity binding, all methods give far poorer results for in vivo assays compared to in vitro assays.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biochemical Research Methods

The DOMINO web-server for active module identification analysis

Hagai Levi, Nima Rahmanian, Ran Elkon, Ron Shamir

Summary: Active module identification is a crucial step in omics analysis. In this article, we introduce a new AMI algorithm called DOMINO and provide an online server for its execution. The server offers additional features such as GO enrichment analysis and module visualizations to aid in result interpretation.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

Computational modeling of mRNA degradation dynamics using deep neural networks

Ofir Yaish, Yaron Orenstein

Summary: In this study, deep neural networks were developed to predict mRNA degradation dynamics and interpret the networks to identify regulatory elements in the 3'-UTR and their positional effect. The findings show that this approach improves the prediction performance of mRNA degradation dynamics and provides new insights into the underlying mechanism of 3'-UTR elements.

BIOINFORMATICS (2022)

Article Biochemistry & Molecular Biology

CT-FOCS: a novel method for inferring cell type-specific enhancer-promoter maps

Tom Aharon Hait, Ran Elkon, Ron Shamir

Summary: In this study, we introduce the CT-FOCS method, which uses linear mixed effect models to infer enhancer-promoter links that are specifically active in certain cell types. The results show that CT-FOCS accurately predicts these links compared to other methods, and it reveals that strictly cell type-specific EP links are rare in the human genome.

NUCLEIC ACIDS RESEARCH (2022)

Article Multidisciplinary Sciences

A machine learning model for predicting deterioration of COVID-19 inpatients

Omer Noy, Dan Coster, Maya Metzger, Itai Atar, Shani Shenhar-Tsarfaty, Shlomo Berliner, Galia Rahav, Ori Rogowski, Ron Shamir

Summary: COVID-19 pandemic poses an urgent threat to global health since December 2019. We developed a predictive model using machine learning methods and routine clinical features to identify patients at risk for clinical deterioration early.

SCIENTIFIC REPORTS (2022)

Article Biochemical Research Methods

G4detector: Convolutional Neural Network to Predict DNA G-Quadruplexes

Mira Barshai, Alice Aubert, Yaron Orenstein

Summary: This article introduces G4detector, a method based on convolutional neural network, to predict G4 structures in DNA sequences. The method improves prediction accuracy by incorporating RNA secondary structure information and has been shown to outperform existing methods on benchmark datasets.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Article Biochemical Research Methods

DeepZF: improved DNA-binding prediction of C2H2-zinc-finger proteins by deep transfer learning

Sofia Aizenshtein-Gazit, Yaron Orenstein

Summary: This study presents DeepZF, a deep-learning-based pipeline for predicting the binding of C2H2-ZF proteins and their DNA-binding preferences. By using in vivo and in vitro datasets and transfer learning, DeepZF achieved an average Pearson correlation greater than 0.94 for predicting DNA binding positions, outperforming existing methods.

BIOINFORMATICS (2022)

Article Biochemical Research Methods

3CAC: improving the classification of phages and plasmids in metagenomic assemblies using assembly graphs

Lianrong Pu, Ron Shamir

Summary: 3CAC is a new three-class classifier that improves the precision of phage and plasmid classification in mixed metagenomic assemblies. By using proximity in the assembly graph to improve the classification of short contigs and contigs with low confidence, 3CAC outperforms PPR-Meta and viralVerify in terms of precision, recall, and F1-score.

BIOINFORMATICS (2022)

Article Biochemistry & Molecular Biology

rG4detector, a novel RNA G-quadruplex predictor, uncovers their impact on stress granule formation

Maor Turner, Yehuda M. Danino, Mira Barshai, Nancy S. Yacovzada, Yahel Cohen, Tsviya Olender, Ron Rotkopf, David Monchaud, Eran Hornstein, Yaron Orenstein

Summary: RNA G-quadruplexes (rG4s) play a direct role in stress granule (SG) biology through their interactions with RNA-binding proteins. The newly developed rG4detector is a powerful tool for predicting and detecting rG4 stability and forming sequences in transcriptomics data.

NUCLEIC ACIDS RESEARCH (2022)

Article Biochemical Research Methods

G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data

Mira Barshai, Barak Engel, Idan Haim, Yaron Orenstein

Summary: G4mismatch, a novel algorithm, accurately and efficiently predicts G-quadruplex propensity for any genomic sequence. Based on a convolutional neural network trained on almost 400 million human genomic loci, G4mismatch achieves high accuracy in predicting G-quadruplex formation and outperforms other methods.

PLOS COMPUTATIONAL BIOLOGY (2023)

Review Biochemical Research Methods

An overview on nucleic-acid G-quadruplex prediction: from rule-based methods to deep neural networks

Karin Elimelech-Zohar, Yaron Orenstein

Summary: Nucleic-acid G-quadruplexes (G4s) are crucial in cellular processes, and experimental assays have been developed to measure them in high throughput. This has enabled the development of machine-learning-based methods, particularly deep neural networks, to predict G4s in any nucleic-acid sequence and species.

BRIEFINGS IN BIOINFORMATICS (2023)

Article Biochemistry & Molecular Biology

Efficient minimizer orders for large values of k using minimum decycling sets

David Pellow, Lianrong Pu, Baris Ekim, Lior Kotlar, Bonnie Berger, Ron Shamir, Yaron Orenstein

Summary: Minimizer schemes, commonly used in high-throughput DNA sequencing data analysis, often select more k-mers than necessary, leading to limited improvement in runtime and memory usage. Universal k-mer hitting sets provide a solution to reduce the number of selected k-mers, but are currently infeasible for large k values. This study introduces decycling-set-based minimizer orders, which improve the efficiency of minimizer orders for large k values by selecting a comparable number of k-mers to universal k-mer hitting sets. Additionally, a method is developed to compute minimizers in real-time without keeping the k-mers in memory, allowing this approach to be used for any value of k. The new orders are expected to enhance the performance of algorithms and data structures in high-throughput DNA sequencing analysis.

GENOME RESEARCH (2023)

Article Biochemistry & Molecular Biology

Integration of gene expression and DNA methylation data across different experiments

Yonatan Itai, Nimrod Rappoport, Ron Shamir

Summary: The integration of multi-omic datasets is valuable in cancer research and precision medicine, but obtaining multi-modal data from the same samples is challenging. INTEND is a novel algorithm that integrates gene expression and DNA methylation datasets by learning a predictive model between the two omics. It achieves superior results compared to other integration algorithms and can uncover connections between DNA methylation and gene expression regulation.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemical Research Methods

Data Set-Adaptive Minimizer Order Reduces Memory Usage in k-Mer Counting

Dan Flomin, David Pellow, Ron Shamir

Summary: The study introduces a method to tailor the order to the data set, reducing memory consumption. By integrating this method into a memory-efficient k-mer counter, the memory footprint was significantly reduced with only a slight increase in runtime. Experimental results showed that the orders produced by this method performed well across data sets from the same species, enabling memory reduction without significant runtime increase.

JOURNAL OF COMPUTATIONAL BIOLOGY (2022)

Meeting Abstract Cardiac & Cardiovascular Systems

CLUSTERING OF CLINICAL-ECHOCARDIOGRAPHIC PHENOTYPES OF COVID-19 DISEASE USING MACHINE-LEARNING TECHNIQUES

Aviram Hochstadt, Eran Shpigelman, Dan Coster, Ilan Merdler, Yan Topilsky, Ron Shamir

JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY (2022)

No Data Available