4.7 Article

Fast and sensitive mapping of bisulfite-treated sequencing data

Journal

BIOINFORMATICS
Volume 28, Issue 13, Pages 1698-1704

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bts254

Keywords

-

Funding

  1. LIFE
  2. BMBF through ICGC MMML-Seq [01KU1002J]
  3. European Union
  4. European Regional Development Fund (ERDF)
  5. Free State of Saxony

Ask authors/readers for more resources

Motivation: Cytosine DNA methylation is one of the major epigenetic modifications and influences gene expression, developmental processes, X-chromosome inactivation, and genomic imprinting. Aberrant methylation is furthermore known to be associated with several diseases including cancer. The gold standard to determine DNA methylation on genome-wide scales is 'bisulfite sequencing': DNA fragments are treated with sodium bisulfite resulting in the conversion of unmethylated cytosines into uracils, whereas methylated cytosines remain unchanged. The resulting sequencing reads thus exhibit asymmetric bisulfite-related mismatches and suffer from an effective reduction of the alphabet size in the unmethylated regions, rendering the mapping of bisulfite sequencing reads computationally much more demanding. As a consequence, currently available read mapping software often fails to achieve high sensitivity and in many cases requires unrealistic computational resources to cope with large real-life datasets. Results: In this study, we present a seed-based approach based on enhanced suffix arrays in conjunction with Myers bit-vector algorithm to efficiently extend seeds to optimal semi-global alignments while allowing for bisulfite-related substitutions. It outperforms most current approaches in terms of sensitivity and performs time-competitive in mapping hundreds of millions of sequencing reads to vertebrate genomes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biochemical Research Methods

Best Match Graphs With Binary Trees

David Schaller, Manuela Geiss, Marc Hellmuth, Peter F. F. Stadler

Summary: We propose a near-cubic algorithm to determine if Best match graphs (BMG) can be explained by a fully resolved gene tree and to construct such a tree. We prove that all binary trees are refinements of the unique binary-refinable tree (BRT) which is a significant refinement of the least resolved tree of a BMG. Additionally, we demonstrate the NP-completeness of editing an arbitrary vertex-colored graph to a binary-explainable BMG and provide an integer linear program formulation for this task.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Article Biochemical Research Methods

Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs

Lisa Fiedler, Matthias Bernt, Martin Middendorf, Peter F. Stadler

Summary: This study presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, considering high substitution rates. The method uses a parallel program design and has been extensively tested for accuracy.

BMC BIOINFORMATICS (2023)

Article Mathematics, Applied

Planar median graphs and cubesquare-graphs

Carsten R. Seemann, Vincent Moulton, Peter F. Stadler, Marc Hellmuth

Summary: Median graphs are connected graphs where there is a unique vertex belonging to the shortest paths between any three vertices. This paper presents new characterizations of planar median graphs by using forbidden subgraphs, isometric cycles, and subgraphs contained inside and outside of 4-cycles. These characterizations lead to a new definition of planar median graphs called cubesquare-graphs, and also provide an O(n log n)-time recognition algorithm for computing the decomposition of a planar median graph into cubes and square-graphs.

DISCRETE APPLIED MATHEMATICS (2023)

Article Mathematics, Applied

Quasi-best match graphs

Annachiara Korchmaros, David Schaller, Marc Hellmuth, Peter F. Stadler

Summary: Quasi-best match graphs (qBMGs) are directed, properly vertex-colored graphs that generalize best match graphs and represent the evolutionary closest relatedness of genes in multiple species. They can be explained by rooted trees where each leaf corresponds to a vertex. Compared to best match graphs, qBMGs only represent best matches within a restricted phylogenetic distance. We provide characterizations of qBMGs, including polynomial-time recognition algorithms, and identify best match graphs as color-sink-free qBMGs. Additionally, two-colored qBMGs are characterized as directed graphs satisfying three simple local conditions.

DISCRETE APPLIED MATHEMATICS (2023)

Article Mathematics

Injective Split Systems

M. Hellmuth, K. T. Huber, V. Moulton, G. E. Scholz, P. F. Stadler

Summary: This paper studies the properties of split systems, with a focus on injective split systems that can be used to represent symbolic tree maps. The authors prove the existence of an injective split system on any set X and provide a characterization for when a split system is injective. They also introduce related concepts such as injective dimension and provide upper and lower bounds for these dimensions. An important motivation for studying injective split systems is their application in representing three-way symbolic maps.

GRAPHS AND COMBINATORICS (2023)

Article Biochemistry & Molecular Biology

Led-Seq: ligation-enhanced double-end sequence-based structure analysis of RNA

Tim Kolberg, Sarah von Loehneysen, Iuliia Ozerova, Karolin Wellner, Roland K. Hartmann, Peter F. Stadler, Mario Moerl

Summary: Structural analysis of RNA is important in understanding its function. Led-Seq is a new approach based on lead-induced cleavage, which allows investigation of both resulting cleavage products. It provides accurate information about cleavage sites and is an improved method for studying RNA structures in vivo.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemical Research Methods

Local RNA folding revisited

Maria Waldl, Thomas Spicher, Ronny Lorenz, Irene K. Beckmann, Ivo L. Hofacker, Sarah Von Loehneysen, Peter F. Stadler

Summary: Most functional RNA elements in large transcripts are local, making local folding a useful approximation for predicting global structure. By averaging local structure predictions over multiple overlapping sequence windows, accuracy can be improved. Dynamic programming allows for efficient computation of these averages. This study presents a mathematical formalization that generalizes previous approaches to the local folding problem, demonstrating that correct Boltzmann samples can be obtained through local stochastic backtracing in McCaskill's algorithms rather than local folding recursions. The ViennaRNA package incorporates these new features to enhance support for local folding, enabling the computation of maximum expected accuracy structures from RNAplfold data and the quantification of sensitivity at individual sequence positions using a mutual information measure.

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (2023)

Review Physics, Applied

SuperConga: An open-source framework for mesoscopic superconductivity

P. Holmvall, N. Wall Wennerdal, M. Hakansson, P. Stadler, O. Shevtsov, T. Lofwander, M. Fogelstrom

Summary: SuperConga is an open-source framework for simulating equilibrium properties of unconventional and ballistic singlet superconductors in 2D mesoscopic grains under a perpendicular external magnetic field at low temperatures. It is designed to be fast and easy to use, allowing research and real-time visualization without the need for a computer cluster. The framework utilizes the parallel computational power of modern graphics processing units and provides a user-friendly Python frontend for defining simulation parameters. It also includes tools for analyzing and visualizing results. The framework can be downloaded for free from and is accompanied by a comprehensive user manual with examples and tutorials.

APPLIED PHYSICS REVIEWS (2023)

Article Health Care Sciences & Services

Toward a Systematic Assessment of Sex Differences in Cystic Fibrosis

Christiane Gaertner, Joerg Fallmann, Peter F. Stadler, Thorsten Kaiser, Sarah J. Berkemer

Summary: In this study, we analyze the expression differences in whole blood transcriptomics between female and male CF patients to determine the pathways related to sex-biased genes and evaluate their potential influence on sex-specific effects in CF patients.

JOURNAL OF PERSONALIZED MEDICINE (2023)

Article Chemistry, Multidisciplinary

The six stages of the convergence of the periodic system to its final structure

Andres M. Bran, Peter F. Stadler, Juergen Jost, Guillermo Restrepo

Summary: The periodic system encodes order and similarity among chemical elements. The system has converged towards its current stable structure through six stages by analyzing the space between 1800 and 2021. Given the limited chemical possibilities and low diversity of the chemical space, the periodic system is expected to remain largely unchanged.

COMMUNICATIONS CHEMISTRY (2023)

Article Biology

Clustering systems of phylogenetic networks

Marc Hellmuth, David Schaller, Peter F. Stadler

Summary: Rooted acyclic graphs play a crucial role in modeling different types of evolutionary processes, and there are correspondences between different classes of networks and their clustering systems, revealing the mutual dependencies among various network types.

THEORY IN BIOSCIENCES (2023)

Article Multidisciplinary Sciences

Defining the landscape of circular RNAs in neuroblastoma unveils a global suppressive function of MYCN

Steffen Fuchs, Clara Danssmann, Filippos Klironomos, Annika Winkler, Joerg Fallmann, Louisa-Marie Kruetzfeldt, Annabell Szymansky, Julian Naderi, Stephan H. Bernhart, Laura Grunewald, Konstantin Helmsauer, Elias Rodriguez-Fos, Marieluise Kirchner, Philipp Mertins, Kathy Astrahantseff, Christin Suenkel, Joern Toedling, Fabienne Meggetto, Marc Remke, Peter F. Stadler, Patrick Hundsdoerfer, Hedwig E. Deubzer, Annette Kuenkele, Peter Lang, Joerg Fuchs, Anton G. Henssen, Angelika Eggert, Nikolaus Rajewsky, Falk Hertwig, Johannes H. Schulte

Summary: The authors investigate the role of circRNAs in cancer by sequencing the transcriptomes of 104 primary neuroblastomas. They find that MYCN amplification suppresses circRNA biogenesis, leading to the identification of circARID1A as a potential oncogenic circRNA in neuroblastoma.

NATURE COMMUNICATIONS (2023)

Article Mathematical & Computational Biology

RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions

John Anders, Peter F. Stadler

Summary: Differentiating regions with coding potential from non-coding regions is an important task in computational biology. RNAcode, a method that utilizes sequence conservation patterns, shows superior classification accuracy for short coding sequences compared to methods that rely on a single input sequence. However, obtaining suitable multiple sequence alignments can be tedious and challenging. To address this, a new web service called RNAcode_Web is introduced, which automates the process of collecting, selecting, and preparing homologous sequences from the NCBI database and constructing multiple sequence alignments needed for RNAcode input. This service simplifies the investigation of previously unannotated coding regions for non-expert users.

JOURNAL OF INTEGRATIVE BIOINFORMATICS (2023)

Article Genetics & Heredity

Tailored machine learning models for functional RNA detection in genome-wide screens

Christopher Klapproth, Siegfried Zoetzsche, Felix Kuehnl, Joerg Fallmann, Peter F. Stadler, Sven Findeiss

Summary: This article introduces a software framework for in silico prediction of non-coding and protein-coding genetic loci, which allows for the alignment-based training, evaluation, and application of machine learning models with user-defined parameters. Instead of using the one-size-fits-all approach of pervasive in silico annotation pipelines, this framework focuses on the structured generation and evaluation of models based on arbitrary features and input data, aiming for stable and explainable results. Furthermore, the software package is applied to a full-genome screen of Drosophila melanogaster and evaluated against the well-known but less flexible program RNAz.

NAR GENOMICS AND BIOINFORMATICS (2023)

Article Chemistry, Multidisciplinary

Comparison of Atom Maps

Marcos E. Gonzalez Laffitte, Nora Beier, Nico Domschke, Peter F. Stadler

Summary: The computation of reliable and chemically correct atom maps from educt/product pairs is a challenging task in cheminformatics. Various competing models have been developed and compared through extensive benchmarking studies. This study formalizes the equivalence of atom maps and demonstrates the use of Fujita's Imaginary Transition State for this purpose. Numerical experiments confirm the practical feasibility. The article also briefly discusses generalizations to subgraph matches, graph transformation rules, and multi-step reaction mechanisms.

MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY (2023)

No Data Available