Article
Biochemistry & Molecular Biology
Henry E. Pratt, Gregory R. Andrews, Nishigandha Phalke, Michael J. Purcaro, Arjan van der Velde, Jill E. Moore, Zhiping Weng
Summary: The update to Factorbook significantly expands the coverage of cell types and TF, includes an expanded motif catalog and new tools for applying motif models within machine learning frameworks, and offers integrative analysis options including annotation of variants and disease traits. The database is available at www.factorbook.org and will continue to expand with the release of ENCODE Phase IV data.
NUCLEIC ACIDS RESEARCH
(2022)
Article
Multidisciplinary Sciences
Chun-Ping Yu, Chen-Hao Kuo, Chase W. Nelson, Chi-An Chen, Zhi Thong Soh, Jinn-Jy Lin, Ru-Xiu Hsiao, Chih-Yao Chang, Wen-Hsiung Li
Summary: By developing a computational pipeline for analyzing ChIP-seq data, this study discovered and characterized a large number of previously unknown TFBSs, providing insights into the biological and genomic features of TFBSs.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2021)
Article
Biochemical Research Methods
Dania Machlab, Lukas Burger, Charlotte Soneson, Filippo M. Rijli, Dirk Schuebeler, Michael B. Stadler
Summary: Proteins binding to specific nucleotide sequences, such as transcription factors, have significant roles in regulating gene expression. The monaLisa package, an R/Bioconductor package, provides methods to identify relevant transcription factors from experimental data. It allows seamless motif analyses without relying on software outside of R.
Article
Plant Sciences
Huiling Cheng, Lifen Liu, Yuying Zhou, Kaixuan Deng, Yuanxin Ge, Xuehai Hu
Summary: An emerging approach using promoter tiling deletion via genome editing is becoming popular in plants. However, the precise positions of core motifs within plant gene promoters are largely unknown. In this study, the researchers developed TSPTFBS 2.0, which integrates DenseNet-based models and three interpretability methods to identify potential core motifs in genomic regions. The developed web-server has great potentials for providing reliable editing targets in genetic screen experiments in plants.
FRONTIERS IN PLANT SCIENCE
(2023)
Review
Biochemical Research Methods
Manuel Tognon, Rosalba Giugno, Luca Pinello
Summary: Transcription factors (TFs) are regulatory proteins that control transcriptional rate by binding to DNA sequences called transcription factor binding sites (TFBS) or motifs. Experimental and computational methods have been developed to identify and characterize TFBS motifs in DNA sequences. This review article discusses these methods, highlighting their advantages, drawbacks, open challenges, and future perspectives.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Automation & Control Systems
Chandra Mohan Dasari, Santhosh Amilpur, Raju Bhukya
Summary: The proposed interpretable deep learning technique, PBVPP, utilizes experimental data and performance metrics to predict binding sites, showing the capability to extract vital features from large-scale genomic sequences and achieve accurate prediction of TFBS and RBP sites. The model reveals how to mine vital features and extract variable length patterns for improved prediction of binding sites, validating obtained motifs against known target motifs in a database, and exhibiting better performance compared to existing methods.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
(2021)
Article
Biochemistry & Molecular Biology
Matthew Dyer, Quy Xiao Xuan Lin, Sofiia Shapoval, Denis Thieffry, Touati Benoukraf
Summary: MethMotif is a publicly available database that provides a comprehensive repository of transcription factor-binding profiles with DNA methylation patterns. The latest release includes over 700 position weight matrices, segregated based on their cofactors and DNA methylation status. The database also offers precomputed GO annotations for human TFs and TF-co-TF complexes, allowing for a comprehensive analysis of TF functions in their context with cofactors. Furthermore, MethMotif has been expanded to include data for two additional species, increasing its applicability and value to the scientific community.
NUCLEIC ACIDS RESEARCH
(2023)
Article
Biochemistry & Molecular Biology
Tom Aharon Hait, Ran Elkon, Ron Shamir
Summary: In this study, we introduce the CT-FOCS method, which uses linear mixed effect models to infer enhancer-promoter links that are specifically active in certain cell types. The results show that CT-FOCS accurately predicts these links compared to other methods, and it reveals that strictly cell type-specific EP links are rare in the human genome.
NUCLEIC ACIDS RESEARCH
(2022)
Article
Biochemistry & Molecular Biology
Aditi Garg, Debnath Pal
Summary: This study introduces a method to improve metal-binding site prediction using the Geometric Hashing algorithm. By screening metal-specific amino acids in the structure ensemble, the residues for Ca2+, Zn2+, Mg2+, Cu2+, and Fe3+ binding sites can be predicted with superior performance compared to existing methods.
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
(2021)
Article
Biochemistry & Molecular Biology
Migle Tomkuvien, Markus Meier, Diana Ikasalaite, Julia Wildenauer, Visvaldas Kairys, Saulius Klimasauskas, Laura Manelyt
Summary: Methylation of cytosine is an important epigenetic mark that can alter DNA and chromatin structure. This study investigates how larger chemical variations in DNA affect chromatin structure and nucleosome formation.
NUCLEIC ACIDS RESEARCH
(2022)
Article
Biochemistry & Molecular Biology
Eugeniya I. Bondar, Maxim E. Troukhan, Konstantin V. Krutovsky, Tatiana V. Tatarinova
Summary: This study utilized computational approaches to predict genome-wide TSS in four conifer species, laying the groundwork for future research on gene regulatory regions.
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES
(2022)
Article
Plant Sciences
Joung Sug Kim, SongHwa Chae, Kyong Mi Jun, Gang-Seob Lee, Jong-Seong Jeon, Kyung Do Kim, Yeon-Ki Kim
Summary: The study successfully identified the DNA-binding sequences of OsWOX13, OsSMF1, and OsWRKY34, along with the respective numbers of putative feature genes. This method could be applicable in analyzing DNA-binding motifs for TFs in the promoter and 5' upstream CDS regions, facilitating the construction of gene networks.
Article
Microbiology
Inna A. Suvorova, Mikhail S. Gelfand
Summary: Comparative genomics techniques were used to identify binding motifs of IclR-family TFs, reconstruct regulons, and analyze their content. Two main types of IclR-family motifs were described, with possible alternative modes of dimerization, as well as trends in site positioning and protein-DNA contacts. The majority of predicted protein-DNA contacts were similar for both types of motifs and aligned well with available experimental data and general protein-DNA interaction trends.
FRONTIERS IN MICROBIOLOGY
(2021)
Article
Biochemistry & Molecular Biology
Tiebin Wang, Nathan Tague, Stephen A. Whelan, Mary J. Dunlop
Summary: Transcription factor decoys can effectively regulate gene expression, with tunability through changes in copy number or modifications to the DNA decoy site sequence. Introducing the decoy system can significantly increase arginine production in metabolic flux steering, without affecting growth compared to wild type strains.
NUCLEIC ACIDS RESEARCH
(2021)
Article
Biochemistry & Molecular Biology
Joan Frigola, Radhakrishnan Sabarinathan, Abel Gonzalez-Perez, Nuria Lopez-Bigas
Summary: An abnormally high rate of UV-light related mutations is observed at transcription factor binding sites (TFBS) across melanomas, with certain TFs impairing the repair of UV-induced lesions and increasing the rate of lesion generation at their binding sites. Through nucleotide-resolution data, it is found that mutation rate increase in TFBS is mainly due to decreased repair efficiency, rather than the rate of lesion formation.
NUCLEIC ACIDS RESEARCH
(2021)
Article
Biochemical Research Methods
Hagai Levi, Nima Rahmanian, Ran Elkon, Ron Shamir
Summary: Active module identification is a crucial step in omics analysis. In this article, we introduce a new AMI algorithm called DOMINO and provide an online server for its execution. The server offers additional features such as GO enrichment analysis and module visualizations to aid in result interpretation.
Article
Biochemical Research Methods
Ofir Yaish, Yaron Orenstein
Summary: In this study, deep neural networks were developed to predict mRNA degradation dynamics and interpret the networks to identify regulatory elements in the 3'-UTR and their positional effect. The findings show that this approach improves the prediction performance of mRNA degradation dynamics and provides new insights into the underlying mechanism of 3'-UTR elements.
Article
Biochemistry & Molecular Biology
Tom Aharon Hait, Ran Elkon, Ron Shamir
Summary: In this study, we introduce the CT-FOCS method, which uses linear mixed effect models to infer enhancer-promoter links that are specifically active in certain cell types. The results show that CT-FOCS accurately predicts these links compared to other methods, and it reveals that strictly cell type-specific EP links are rare in the human genome.
NUCLEIC ACIDS RESEARCH
(2022)
Article
Multidisciplinary Sciences
Omer Noy, Dan Coster, Maya Metzger, Itai Atar, Shani Shenhar-Tsarfaty, Shlomo Berliner, Galia Rahav, Ori Rogowski, Ron Shamir
Summary: COVID-19 pandemic poses an urgent threat to global health since December 2019. We developed a predictive model using machine learning methods and routine clinical features to identify patients at risk for clinical deterioration early.
SCIENTIFIC REPORTS
(2022)
Article
Biochemical Research Methods
Mira Barshai, Alice Aubert, Yaron Orenstein
Summary: This article introduces G4detector, a method based on convolutional neural network, to predict G4 structures in DNA sequences. The method improves prediction accuracy by incorporating RNA secondary structure information and has been shown to outperform existing methods on benchmark datasets.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
(2022)
Article
Biochemical Research Methods
Sofia Aizenshtein-Gazit, Yaron Orenstein
Summary: This study presents DeepZF, a deep-learning-based pipeline for predicting the binding of C2H2-ZF proteins and their DNA-binding preferences. By using in vivo and in vitro datasets and transfer learning, DeepZF achieved an average Pearson correlation greater than 0.94 for predicting DNA binding positions, outperforming existing methods.
Article
Biochemical Research Methods
Lianrong Pu, Ron Shamir
Summary: 3CAC is a new three-class classifier that improves the precision of phage and plasmid classification in mixed metagenomic assemblies. By using proximity in the assembly graph to improve the classification of short contigs and contigs with low confidence, 3CAC outperforms PPR-Meta and viralVerify in terms of precision, recall, and F1-score.
Article
Biochemistry & Molecular Biology
Maor Turner, Yehuda M. Danino, Mira Barshai, Nancy S. Yacovzada, Yahel Cohen, Tsviya Olender, Ron Rotkopf, David Monchaud, Eran Hornstein, Yaron Orenstein
Summary: RNA G-quadruplexes (rG4s) play a direct role in stress granule (SG) biology through their interactions with RNA-binding proteins. The newly developed rG4detector is a powerful tool for predicting and detecting rG4 stability and forming sequences in transcriptomics data.
NUCLEIC ACIDS RESEARCH
(2022)
Article
Biochemical Research Methods
Mira Barshai, Barak Engel, Idan Haim, Yaron Orenstein
Summary: G4mismatch, a novel algorithm, accurately and efficiently predicts G-quadruplex propensity for any genomic sequence. Based on a convolutional neural network trained on almost 400 million human genomic loci, G4mismatch achieves high accuracy in predicting G-quadruplex formation and outperforms other methods.
PLOS COMPUTATIONAL BIOLOGY
(2023)
Review
Biochemical Research Methods
Karin Elimelech-Zohar, Yaron Orenstein
Summary: Nucleic-acid G-quadruplexes (G4s) are crucial in cellular processes, and experimental assays have been developed to measure them in high throughput. This has enabled the development of machine-learning-based methods, particularly deep neural networks, to predict G4s in any nucleic-acid sequence and species.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Biochemistry & Molecular Biology
David Pellow, Lianrong Pu, Baris Ekim, Lior Kotlar, Bonnie Berger, Ron Shamir, Yaron Orenstein
Summary: Minimizer schemes, commonly used in high-throughput DNA sequencing data analysis, often select more k-mers than necessary, leading to limited improvement in runtime and memory usage. Universal k-mer hitting sets provide a solution to reduce the number of selected k-mers, but are currently infeasible for large k values. This study introduces decycling-set-based minimizer orders, which improve the efficiency of minimizer orders for large k values by selecting a comparable number of k-mers to universal k-mer hitting sets. Additionally, a method is developed to compute minimizers in real-time without keeping the k-mers in memory, allowing this approach to be used for any value of k. The new orders are expected to enhance the performance of algorithms and data structures in high-throughput DNA sequencing analysis.
Article
Biochemistry & Molecular Biology
Yonatan Itai, Nimrod Rappoport, Ron Shamir
Summary: The integration of multi-omic datasets is valuable in cancer research and precision medicine, but obtaining multi-modal data from the same samples is challenging. INTEND is a novel algorithm that integrates gene expression and DNA methylation datasets by learning a predictive model between the two omics. It achieves superior results compared to other integration algorithms and can uncover connections between DNA methylation and gene expression regulation.
NUCLEIC ACIDS RESEARCH
(2023)
Article
Biochemical Research Methods
Dan Flomin, David Pellow, Ron Shamir
Summary: The study introduces a method to tailor the order to the data set, reducing memory consumption. By integrating this method into a memory-efficient k-mer counter, the memory footprint was significantly reduced with only a slight increase in runtime. Experimental results showed that the orders produced by this method performed well across data sets from the same species, enabling memory reduction without significant runtime increase.
JOURNAL OF COMPUTATIONAL BIOLOGY
(2022)
Meeting Abstract
Cardiac & Cardiovascular Systems
Aviram Hochstadt, Eran Shpigelman, Dan Coster, Ilan Merdler, Yan Topilsky, Ron Shamir
JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY
(2022)