☆ 4.6 Article

Assessment of Algorithms for Inferring Positional Weight Matrix Motifs of Transcription Factor Binding Sites Using Protein Binding Microarray Data

PLOS ONE (2012)

Journal

PLOS ONE

Volume 7, Issue 9, Pages -

Publisher

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pone.0046145

Keywords

-

Categories

Multidisciplinary Sciences

Funding

European Community [HEALTH-F4-2009-223575]
Israel Science Foundation [802/08]
Edmond J. Safra Center for Bioinformatics at Tel Aviv University

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The new technology of protein binding microarrays (PBMs) allows simultaneous measurement of the binding intensities of a transcription factor to tens of thousands of synthetic double-stranded DNA probes, covering all possible 10-mers. A key computational challenge is inferring the binding motif from these data. We present a systematic comparison of four methods developed specifically for reconstructing a binding site motif represented as a positional weight matrix from PBM data. The reconstructed motifs were evaluated in terms of three criteria: concordance with reference motifs from the literature and ability to predict in vivo and in vitro bindings. The evaluation encompassed over 200 transcription factors and some 300 assays. The results show a tradeoff between how the methods perform according to the different criteria, and a dichotomy of method types. Algorithms that construct motifs with low information content predict PBM probe ranking more faithfully, while methods that produce highly informative motifs match reference motifs better. Interestingly, in predicting high-affinity binding, all methods give far poorer results for in vivo assays compared to in vitro assays.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Biochemistry & Molecular Biology

Factorbook: an updated catalog of transcription factor motifs and candidate regulatory motif sites

Henry E. Pratt, Gregory R. Andrews, Nishigandha Phalke, Michael J. Purcaro, Arjan van der Velde, Jill E. Moore, Zhiping Weng

Summary: The update to Factorbook significantly expands the coverage of cell types and TF, includes an expanded motif catalog and new tools for applying motif models within machine learning frameworks, and offers integrative analysis options including annotation of variants and disease traits. The database is available at www.factorbook.org and will continue to expand with the release of ENCODE Phase IV data.

NUCLEIC ACIDS RESEARCH (2022)

Add to Collection

Article Multidisciplinary Sciences

Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data

Chun-Ping Yu, Chen-Hao Kuo, Chase W. Nelson, Chi-An Chen, Zhi Thong Soh, Jinn-Jy Lin, Ru-Xiu Hsiao, Chih-Yao Chang, Wen-Hsiung Li

Summary: By developing a computational pipeline for analyzing ChIP-seq data, this study discovered and characterized a large number of previously unknown TFBSs, providing insights into the biological and genomic features of TFBSs.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2021)

Add to Collection

Article Biochemical Research Methods

monaLisa: an R/Bioconductor package for identifying regulatory motifs

Dania Machlab, Lukas Burger, Charlotte Soneson, Filippo M. Rijli, Dirk Schuebeler, Michael B. Stadler

Summary: Proteins binding to specific nucleotide sequences, such as transcription factors, have significant roles in regulating gene expression. The monaLisa package, an R/Bioconductor package, provides methods to identify relevant transcription factors from experimental data. It allows seamless motif analyses without relying on software outside of R.

BIOINFORMATICS (2022)

Add to Collection

Article Plant Sciences

TSPTFBS 2.0: trans-species prediction of transcription factor binding sites and identification of their core motifs in plants

Huiling Cheng, Lifen Liu, Yuying Zhou, Kaixuan Deng, Yuanxin Ge, Xuehai Hu

Summary: An emerging approach using promoter tiling deletion via genome editing is becoming popular in plants. However, the precise positions of core motifs within plant gene promoters are largely unknown. In this study, the researchers developed TSPTFBS 2.0, which integrates DenseNet-based models and three interpretability methods to identify potential core motifs in genomic regions. The developed web-server has great potentials for providing reliable editing targets in genetic screen experiments in plants.

FRONTIERS IN PLANT SCIENCE (2023)

Add to Collection

Review Biochemical Research Methods

A survey on algorithms to characterize transcription factor binding sites

Manuel Tognon, Rosalba Giugno, Luca Pinello

Summary: Transcription factors (TFs) are regulatory proteins that control transcriptional rate by binding to DNA sequences called transcription factor binding sites (TFBS) or motifs. Experimental and computational methods have been developed to identify and characterize TFBS motifs in DNA sequences. This review article discusses these methods, highlighting their advantages, drawbacks, open challenges, and future perspectives.

BRIEFINGS IN BIOINFORMATICS (2023)

Add to Collection

Article Automation & Control Systems

Exploring variable-length features (motifs) for predicting binding sites through interpretable deep neural networks

Chandra Mohan Dasari, Santhosh Amilpur, Raju Bhukya

Summary: The proposed interpretable deep learning technique, PBVPP, utilizes experimental data and performance metrics to predict binding sites, showing the capability to extract vital features from large-scale genomic sequences and achieve accurate prediction of TFBS and RBP sites. The model reveals how to mine vital features and extract variable length patterns for improved prediction of binding sites, validating obtained motifs against known target motifs in a database, and exhibiting better performance compared to existing methods.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE (2021)

Add to Collection

Article Biochemistry & Molecular Biology

MethMotif.Org 2024: a database integrating context-specific transcription factor-binding motifs with DNA methylation patterns

Matthew Dyer, Quy Xiao Xuan Lin, Sofiia Shapoval, Denis Thieffry, Touati Benoukraf

Summary: MethMotif is a publicly available database that provides a comprehensive repository of transcription factor-binding profiles with DNA methylation patterns. The latest release includes over 700 position weight matrices, segregated based on their cofactors and DNA methylation status. The database also offers precomputed GO annotations for human TFs and TF-co-TF complexes, allowing for a comprehensive analysis of TF functions in their context with cofactors. Furthermore, MethMotif has been expanded to include data for two additional species, increasing its applicability and value to the scientific community.

NUCLEIC ACIDS RESEARCH (2023)

Add to Collection

Article Biochemistry & Molecular Biology

CT-FOCS: a novel method for inferring cell type-specific enhancer-promoter maps

Tom Aharon Hait, Ran Elkon, Ron Shamir

Summary: In this study, we introduce the CT-FOCS method, which uses linear mixed effect models to infer enhancer-promoter links that are specifically active in certain cell types. The results show that CT-FOCS accurately predicts these links compared to other methods, and it reveals that strictly cell type-specific EP links are rare in the human genome.

NUCLEIC ACIDS RESEARCH (2022)

Add to Collection

Article Biochemistry & Molecular Biology

Inferring metal binding sites in flexible regions of proteins

Aditi Garg, Debnath Pal

Summary: This study introduces a method to improve metal-binding site prediction using the Geometric Hashing algorithm. By screening metal-specific amino acids in the structure ensemble, the residues for Ca2+, Zn2+, Mg2+, Cu2+, and Fe3+ binding sites can be predicted with superior performance compared to existing methods.

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS (2021)

Add to Collection

Article Biochemistry & Molecular Biology

Enhanced nucleosome assembly at CpG sites containing an extended 5-methylcytosine analogue

Migle Tomkuvien, Markus Meier, Diana Ikasalaite, Julia Wildenauer, Visvaldas Kairys, Saulius Klimasauskas, Laura Manelyt

Summary: Methylation of cytosine is an important epigenetic mark that can alter DNA and chromatin structure. This study investigates how larger chemical variations in DNA affect chromatin structure and nucleosome formation.

NUCLEIC ACIDS RESEARCH (2022)

Add to Collection

Article Biochemistry & Molecular Biology

Genome-Wide Prediction of Transcription Start Sites in Conifers

Eugeniya I. Bondar, Maxim E. Troukhan, Konstantin V. Krutovsky, Tatiana V. Tatarinova

Summary: This study utilized computational approaches to predict genome-wide TSS in four conifer species, laying the groundwork for future research on gene regulatory regions.

INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES (2022)

Add to Collection

Article Plant Sciences

Rice protein-binding microarrays: a tool to detect cis-acting elements near promoter regions in rice

Joung Sug Kim, SongHwa Chae, Kyong Mi Jun, Gang-Seob Lee, Jong-Seong Jeon, Kyung Do Kim, Yeon-Ki Kim

Summary: The study successfully identified the DNA-binding sequences of OsWOX13, OsSMF1, and OsWRKY34, along with the respective numbers of putative feature genes. This method could be applicable in analyzing DNA-binding motifs for TFs in the promoter and 5' upstream CDS regions, facilitating the construction of gene networks.

PLANTA (2021)

Add to Collection

Article Microbiology

Comparative Analysis of the IclR-Family of Bacterial Transcription Factors and Their DNA-Binding Motifs: Structure, Positioning, Co-Evolution, Regulon Content

Inna A. Suvorova, Mikhail S. Gelfand

Summary: Comparative genomics techniques were used to identify binding motifs of IclR-family TFs, reconstruct regulons, and analyze their content. Two main types of IclR-family motifs were described, with possible alternative modes of dimerization, as well as trends in site positioning and protein-DNA contacts. The majority of predicted protein-DNA contacts were similar for both types of motifs and aligned well with available experimental data and general protein-DNA interaction trends.

FRONTIERS IN MICROBIOLOGY (2021)

Add to Collection

Article Biochemistry & Molecular Biology

Programmable gene regulation for metabolic engineering using decoy transcription factor binding sites

Tiebin Wang, Nathan Tague, Stephen A. Whelan, Mary J. Dunlop

Summary: Transcription factor decoys can effectively regulate gene expression, with tunability through changes in copy number or modifications to the DNA decoy site sequence. Introducing the decoy system can significantly increase arginine production in metabolic flux steering, without affecting growth compared to wild type strains.

NUCLEIC ACIDS RESEARCH (2021)

Add to Collection

Article Biochemistry & Molecular Biology

Variable interplay of UV-induced DNA damage and repair at transcription factor binding sites

Joan Frigola, Radhakrishnan Sabarinathan, Abel Gonzalez-Perez, Nuria Lopez-Bigas

Summary: An abnormally high rate of UV-light related mutations is observed at transcription factor binding sites (TFBS) across melanomas, with certain TFs impairing the repair of UV-induced lesions and increasing the rate of lesion generation at their binding sites. Through nucleotide-resolution data, it is found that mutation rate increase in TFBS is mainly due to decreased repair efficiency, rather than the rate of lesion formation.

NUCLEIC ACIDS RESEARCH (2021)

Add to Collection

Article Biochemical Research Methods

The DOMINO web-server for active module identification analysis

Hagai Levi, Nima Rahmanian, Ran Elkon, Ron Shamir

Summary: Active module identification is a crucial step in omics analysis. In this article, we introduce a new AMI algorithm called DOMINO and provide an online server for its execution. The server offers additional features such as GO enrichment analysis and module visualizations to aid in result interpretation.

BIOINFORMATICS (2022)

Add to Collection

Article Biochemical Research Methods

Computational modeling of mRNA degradation dynamics using deep neural networks

Ofir Yaish, Yaron Orenstein

Summary: In this study, deep neural networks were developed to predict mRNA degradation dynamics and interpret the networks to identify regulatory elements in the 3'-UTR and their positional effect. The findings show that this approach improves the prediction performance of mRNA degradation dynamics and provides new insights into the underlying mechanism of 3'-UTR elements.

BIOINFORMATICS (2022)

Add to Collection

Article Biochemistry & Molecular Biology

CT-FOCS: a novel method for inferring cell type-specific enhancer-promoter maps

Tom Aharon Hait, Ran Elkon, Ron Shamir

Summary: In this study, we introduce the CT-FOCS method, which uses linear mixed effect models to infer enhancer-promoter links that are specifically active in certain cell types. The results show that CT-FOCS accurately predicts these links compared to other methods, and it reveals that strictly cell type-specific EP links are rare in the human genome.

NUCLEIC ACIDS RESEARCH (2022)

Add to Collection

Article Multidisciplinary Sciences

A machine learning model for predicting deterioration of COVID-19 inpatients

Omer Noy, Dan Coster, Maya Metzger, Itai Atar, Shani Shenhar-Tsarfaty, Shlomo Berliner, Galia Rahav, Ori Rogowski, Ron Shamir

Summary: COVID-19 pandemic poses an urgent threat to global health since December 2019. We developed a predictive model using machine learning methods and routine clinical features to identify patients at risk for clinical deterioration early.

SCIENTIFIC REPORTS (2022)

Add to Collection

Article Biochemical Research Methods

G4detector: Convolutional Neural Network to Predict DNA G-Quadruplexes

Mira Barshai, Alice Aubert, Yaron Orenstein

Summary: This article introduces G4detector, a method based on convolutional neural network, to predict G4 structures in DNA sequences. The method improves prediction accuracy by incorporating RNA secondary structure information and has been shown to outperform existing methods on benchmark datasets.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2022)

Add to Collection

Article Biochemical Research Methods

DeepZF: improved DNA-binding prediction of C2H2-zinc-finger proteins by deep transfer learning

Sofia Aizenshtein-Gazit, Yaron Orenstein

Summary: This study presents DeepZF, a deep-learning-based pipeline for predicting the binding of C2H2-ZF proteins and their DNA-binding preferences. By using in vivo and in vitro datasets and transfer learning, DeepZF achieved an average Pearson correlation greater than 0.94 for predicting DNA binding positions, outperforming existing methods.

BIOINFORMATICS (2022)

Add to Collection

Article Biochemical Research Methods

3CAC: improving the classification of phages and plasmids in metagenomic assemblies using assembly graphs

Lianrong Pu, Ron Shamir

Summary: 3CAC is a new three-class classifier that improves the precision of phage and plasmid classification in mixed metagenomic assemblies. By using proximity in the assembly graph to improve the classification of short contigs and contigs with low confidence, 3CAC outperforms PPR-Meta and viralVerify in terms of precision, recall, and F1-score.

BIOINFORMATICS (2022)

Add to Collection

Article Biochemistry & Molecular Biology

rG4detector, a novel RNA G-quadruplex predictor, uncovers their impact on stress granule formation

Maor Turner, Yehuda M. Danino, Mira Barshai, Nancy S. Yacovzada, Yahel Cohen, Tsviya Olender, Ron Rotkopf, David Monchaud, Eran Hornstein, Yaron Orenstein

Summary: RNA G-quadruplexes (rG4s) play a direct role in stress granule (SG) biology through their interactions with RNA-binding proteins. The newly developed rG4detector is a powerful tool for predicting and detecting rG4 stability and forming sequences in transcriptomics data.

NUCLEIC ACIDS RESEARCH (2022)

Add to Collection

Article Biochemical Research Methods

G4mismatch: Deep neural networks to predict G-quadruplex propensity based on G4-seq data

Mira Barshai, Barak Engel, Idan Haim, Yaron Orenstein

Summary: G4mismatch, a novel algorithm, accurately and efficiently predicts G-quadruplex propensity for any genomic sequence. Based on a convolutional neural network trained on almost 400 million human genomic loci, G4mismatch achieves high accuracy in predicting G-quadruplex formation and outperforms other methods.

PLOS COMPUTATIONAL BIOLOGY (2023)

Add to Collection

Review Biochemical Research Methods

An overview on nucleic-acid G-quadruplex prediction: from rule-based methods to deep neural networks

Karin Elimelech-Zohar, Yaron Orenstein

Summary: Nucleic-acid G-quadruplexes (G4s) are crucial in cellular processes, and experimental assays have been developed to measure them in high throughput. This has enabled the development of machine-learning-based methods, particularly deep neural networks, to predict G4s in any nucleic-acid sequence and species.

BRIEFINGS IN BIOINFORMATICS (2023)

Add to Collection

Article Biochemistry & Molecular Biology

Efficient minimizer orders for large values of k using minimum decycling sets

David Pellow, Lianrong Pu, Baris Ekim, Lior Kotlar, Bonnie Berger, Ron Shamir, Yaron Orenstein

Summary: Minimizer schemes, commonly used in high-throughput DNA sequencing data analysis, often select more k-mers than necessary, leading to limited improvement in runtime and memory usage. Universal k-mer hitting sets provide a solution to reduce the number of selected k-mers, but are currently infeasible for large k values. This study introduces decycling-set-based minimizer orders, which improve the efficiency of minimizer orders for large k values by selecting a comparable number of k-mers to universal k-mer hitting sets. Additionally, a method is developed to compute minimizers in real-time without keeping the k-mers in memory, allowing this approach to be used for any value of k. The new orders are expected to enhance the performance of algorithms and data structures in high-throughput DNA sequencing analysis.

GENOME RESEARCH (2023)

Add to Collection

Article Biochemistry & Molecular Biology

Integration of gene expression and DNA methylation data across different experiments

Yonatan Itai, Nimrod Rappoport, Ron Shamir

Summary: The integration of multi-omic datasets is valuable in cancer research and precision medicine, but obtaining multi-modal data from the same samples is challenging. INTEND is a novel algorithm that integrates gene expression and DNA methylation datasets by learning a predictive model between the two omics. It achieves superior results compared to other integration algorithms and can uncover connections between DNA methylation and gene expression regulation.

NUCLEIC ACIDS RESEARCH (2023)

Add to Collection

Article Biochemical Research Methods

Data Set-Adaptive Minimizer Order Reduces Memory Usage in k-Mer Counting

Dan Flomin, David Pellow, Ron Shamir

Summary: The study introduces a method to tailor the order to the data set, reducing memory consumption. By integrating this method into a memory-efficient k-mer counter, the memory footprint was significantly reduced with only a slight increase in runtime. Experimental results showed that the orders produced by this method performed well across data sets from the same species, enabling memory reduction without significant runtime increase.

JOURNAL OF COMPUTATIONAL BIOLOGY (2022)

Add to Collection

Meeting Abstract Cardiac & Cardiovascular Systems

CLUSTERING OF CLINICAL-ECHOCARDIOGRAPHIC PHENOTYPES OF COVID-19 DISEASE USING MACHINE-LEARNING TECHNIQUES

Aviram Hochstadt, Eran Shpigelman, Dan Coster, Ilan Merdler, Yan Topilsky, Ron Shamir

JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY (2022)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.