4.6 Article

Assessment of composite motif discovery methods

期刊

BMC BIOINFORMATICS
卷 9, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/1471-2105-9-123

关键词

-

向作者/读者索取更多资源

Background: Computational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery. Results: We have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise. Conclusion: Although some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Immunology

B cell tolerance and antibody production to the celiac disease autoantigen transglutaminase 2

M. Fleur du Pre, Jana Blazevski, Alisa E. Dewan, Jorunn Stamnaes, Chakravarthi Kanduri, Geir Kjetil Sandve, Marie K. Johannesen, Christian B. Lindstad, Kathrin Hnida, Lars Fugger, Gerry Melino, Shuo-Wang Qiao, Ludvig M. Sollid

JOURNAL OF EXPERIMENTAL MEDICINE (2020)

Article Biochemical Research Methods

Enhanced identification of significant regulators of gene expression

Rezvan Ehsani, Finn Drablos

BMC BIOINFORMATICS (2020)

Article Multidisciplinary Sciences

Targeted sequencing of genes associated with the mismatch repair pathway in patients with endometrial cancer

Ashish Kumar Singh, Bente Talseth-Palmer, Mary McPhillips, Liss Anne Solberg Lavik, Alexandre Xavier, Finn Drablos, Wenche Sjursen

PLOS ONE (2020)

Editorial Material Biochemical Research Methods

Ten simple rules for quick and dirty scientific programming

Gabriel Balaban, Ivar Grytten, Knut Dagestad Rand, Lonneke Scheffer, Geir Kjetil Sandve

PLOS COMPUTATIONAL BIOLOGY (2021)

Article Genetics & Heredity

FunHoP: Enhanced Visualization and Analysis of Functionally Homologous Proteins in Complex Metabolic Networks

Kjersti Rise, May-Britt Tessem, Finn Drablos, Morten B. Rye

Summary: Cytoscape is commonly used for visualization and analysis of metabolic pathways, but interpreting pathways based on KEGG data can be challenging. FunHoP is a new method that shows all possible genes in each node, making the pathways more complete and providing more consistent biological interpretations of metabolic pathways.

GENOMICS PROTEOMICS & BIOINFORMATICS (2021)

Article Biochemical Research Methods

CompAIRR: ultra-fast comparison of adaptive immune receptor repertoires by exact and approximate sequence matching

Torbjorn Rognes, Lonneke Scheffer, Victor Greiff, Geir Kjetil Sandve

Summary: In this study, CompAIRR was developed for fast computation of AIRR overlap, achieving a 1000-fold improvement in computational speed compared to existing methods. CompAIRR has been integrated with immuneML, a machine learning ecosystem for AIRR analysis.

BIOINFORMATICS (2022)

Article Multidisciplinary Sciences

FunHoP analysis reveals upregulation of mitochondrial genes in prostate cancer

Kjersti Rise, May-Britt Tessem, Finn Drablos, Morten Beck Rye

Summary: This study expands a method for analyzing functional homologous proteins, which can differentiate between mitochondrial and non-mitochondrial processes in cancer cells and normal cells. The results show that mitochondrial pathways are upregulated in prostate cancer.

PLOS ONE (2022)

Article Multidisciplinary Sciences

Identification of gluten T cell epitopes driving celiac disease

Marketa Chlubnova, Asbjorn O. Christophersen, Geir Kjetil F. Sandve, Knut E. A. Lundin, Jorgen Jahnsen, Shiva Dahal-Koirala, Ludvig M. Sollid

Summary: 42 wheat gluten-reactive T cell clones with different phenotypes and no reactivity to known epitopes were screened. Synthetic peptides were identified bioinformatically from a wheat gluten protein database and tested against the T cell clones. Reactivity of 10 T cell clones was assigned, and 5 previously uncharacterized gliadin/glutenin epitopes with a 9-nucleotide oligomer core region were identified. This work represents an advance in identifying CeD-driving gluten epitopes.

SCIENCE ADVANCES (2023)

Article Psychiatry

Effects of prenatal exposure to (es)citalopram and maternal depression during pregnancy on DNA methylation and child neurodevelopment

Emilie Willoch Olstad, Hedvig Marie Egeland Nordeng, Geir Kjetil Sandve, Robert Lyle, Kristina Gervin

Summary: This study investigated the associations between prenatal exposure to citalopram or escitalopram, maternal depression, and offspring DNA methylation (DNAm). The researchers also examined the interaction effect of (es)citalopram exposure and DNAm on neurodevelopmental outcomes, as well as the correlation between DNAm at birth and neurodevelopmental trajectories in childhood.

TRANSLATIONAL PSYCHIATRY (2023)

Article Computer Science, Artificial Intelligence

Linguistically inspired roadmap for building biologically reliable protein language models

Mai Ha Vu, Rahmad Akbar, Philippe A. Robert, Bartlomiej Swiatczak, Geir Kjetil Sandve, Victor Greiff, Dag Trygve Truslew Haug

Summary: Language models trained on proteins can predict functions from sequences but lack insight into underlying mechanisms. Extracting rules from these models can make them interpretable and help explain biological mechanisms.

NATURE MACHINE INTELLIGENCE (2023)

Article Biochemical Research Methods

Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data

Ping-Han Hsieh, Camila Miranda Lopes-Ramos, Manuela Zucknick, Geir Kjetil Sandve, Kimberly Glass, Marieke Lydia Kuijjer

Summary: Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns. However, certain normalization methods can introduce false-positive associations between genes, hindering downstream co-expression network analysis. In this study, a normalization method called SNAIL (Smooth-quantile Normalization Adaptation for the Inference of co-expression Links) is developed to avoid false-positive associations and retain associations to genes expressed in small subgroups of samples. This method has the potential to impact network modeling and association-based approaches in large-scale heterogeneous data.

BIOINFORMATICS (2023)

Article Biology

Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification

Chakravarthi Kanduri, Milena Pavlovic, Lonneke Scheffer, Keshav Motwani, Maria Chernigovskaya, Victor Greiff, Geir K. Sandve

Summary: This article presents a study aimed at determining the effectiveness of baseline machine learning (ML) methods in the classification of adaptive immune receptor repertoires (AIRRs). The study generated a series of synthetic AIRR benchmark datasets and found that even when the immune signal occurs only in 1 out of 50,000 AIR sequences, the baseline L1-penalized logistic regression model can achieve high prediction accuracy.

GIGASCIENCE (2022)

Article Biology

GAPGOM-an R package for gene annotation prediction using GO Metrics

Casper van Mourik, Rezvan Ehsani, Finn Drablos

Summary: Gene products can be described using GO terms, but for many genes the information about their products, especially lncRNAs, is limited. GAPGOM integrates two algorithms for annotation prediction and similarity estimation between GO graphs, providing improved performance and additional features.

BMC RESEARCH NOTES (2021)

Article Oncology

Robust Distance Measures forkNN Classification of Cancer Data

Rezvan Ehsani, Finn Drablos

CANCER INFORMATICS (2020)

暂无数据