☆ 4.7 Article

Fast and sensitive mapping of bisulfite-treated sequencing data

BIOINFORMATICS (2012)

Journal

BIOINFORMATICS

Volume 28, Issue 13, Pages 1698-1704

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/bts254

Keywords

-

Categories

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

Funding

LIFE
BMBF through ICGC MMML-Seq [01KU1002J]
European Union
European Regional Development Fund (ERDF)
Free State of Saxony

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: Cytosine DNA methylation is one of the major epigenetic modifications and influences gene expression, developmental processes, X-chromosome inactivation, and genomic imprinting. Aberrant methylation is furthermore known to be associated with several diseases including cancer. The gold standard to determine DNA methylation on genome-wide scales is 'bisulfite sequencing': DNA fragments are treated with sodium bisulfite resulting in the conversion of unmethylated cytosines into uracils, whereas methylated cytosines remain unchanged. The resulting sequencing reads thus exhibit asymmetric bisulfite-related mismatches and suffer from an effective reduction of the alphabet size in the unmethylated regions, rendering the mapping of bisulfite sequencing reads computationally much more demanding. As a consequence, currently available read mapping software often fails to achieve high sensitivity and in many cases requires unrealistic computational resources to cope with large real-life datasets. Results: In this study, we present a seed-based approach based on enhanced suffix arrays in conjunction with Myers bit-vector algorithm to efficiently extend seeds to optimal semi-global alignments while allowing for bisulfite-related substitutions. It outperforms most current approaches in terms of sensitivity and performs time-competitive in mapping hundreds of millions of sequencing reads to vertebrate genomes.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

Secondary Ratings

Novelty

-

Significance

-

Scientific rigor

-

Rate this paper

Recommended

Article Biochemical Research Methods

RLM: fast and simplified extraction of read-level methylation metrics from bisulfite sequencing data

Sara Hetzel, Pay Giesselmann, Knut Reinert, Alexander Meissner, Helene Kretzmer

Summary: Bisulfite sequencing data can provide valuable information beyond simple methylation assessment by analyzing single-read patterns. To address the bottleneck of read-level analysis, a fast and scalable tool called RLM was developed to compute frequently used read-level methylation statistics efficiently. This tool supports standard alignment tools, is independent of reference genomes, and can handle various sequencing experiment designs.

BIOINFORMATICS (2021)

Add to Collection

Article Cell Biology

Evaluating the Consistency of Gene Methylation in Liver Cancer Using Bisulfite Sequencing Data

Xubin Zheng, Qiong Wu, Haonan Wu, Kwong-Sak Leung, Man-Hon Wong, Xueyan Liu, Lixin Cheng

Summary: This study introduced the most prevalent methods for processing bisulfite sequencing data and evaluated the consistency of the data acquired from different measurements in liver cancer. Differential methylated genes measured by various bisulfite sequencing assays and 450 k beadchip were consistently hypo-methylated in liver cancer with high functional similarity.

FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY (2021)

Add to Collection

Review Chemistry, Multidisciplinary

Bisulfite-free mapping of DNA cytosine modifications: challenges and perspectives

Yanfang Du, Ying Tang, Bingqian Lin, Xiaochen Xue, Yafen Wang, Yibin Liu

Summary: The mapping of DNA cytosine modifications is essential for understanding epigenetic regulation. While bisulfite sequencing has limitations, bisulfite-free methods have emerged as promising alternatives. This review provides an overview of both methods, discussing their advantages, limitations, and applications. Challenges and future perspectives for advancing bisulfite-free mapping methods are also explored.

SCIENCE CHINA-CHEMISTRY (2023)

Add to Collection

Article Biochemical Research Methods

Fast and sensitive validation of fusion transcripts in whole-genome sequencing data

Voelundur Hafstao, Jari Hakkinen, Helena Persson

Summary: This article presents a method for validating fusion transcripts detected by RNA sequencing in matched whole-genome sequencing data. The pipeline uses discordant read pairs and soft-clipped read alignments to identify supported fusion events and determine genomic breakpoints. The method is faster and more sensitive than commonly used structural variant detection software BreakDancer and Manta.

BMC BIOINFORMATICS (2023)

Add to Collection

Article Biochemical Research Methods

LuxRep: a technical replicate-aware method for bisulfite sequencing data analysis

Maia H. Malonzo, Viivi Halla-Aho, Mikko Konki, Riikka J. Lund, Harri Lahdesmaki

Summary: This study developed a probabilistic method and software, LuxRep, that improves the accuracy of DNA methylation level estimation and detection of differentially methylated sites by considering technical replicates with varying bisulfite conversion rates. The use of variational inference also speeds up computation time necessary for whole genome analysis.

BMC BIOINFORMATICS (2022)

Add to Collection

Review Biochemical Research Methods

Comprehensive benchmarking of software for mapping whole genome bisulfite data: from read alignment to DNA methylation analysis

Adam Nunn, Christian Otto, Peter F. Stadler, David Langenberger

Summary: Whole genome bisulfite sequencing has advanced epigenetic analysis by providing nucleotide-level resolution of 5-methylcytosine (5mC) on a genome-wide scale. Evaluations of nine short-read aligners suggest that BWA-meth and BSMAP are most effective in utilizing data during mapping. Downstream methylation analysis is influenced by handling multi-mapping reads and mapping quality.

BRIEFINGS IN BIOINFORMATICS (2021)

Add to Collection

Article Genetics & Heredity

kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph

Ze-Gang Wei, Xing-Guo Fan, Hao Zhang, Xiao-Dan Zhang, Fei Liu, Yu Qian, Shao-Wu Zhang

Summary: In this paper, a novel mapper called kngMap is introduced for aligning long noisy SMS reads to a reference sequence using a k-mer neighborhood graph. Experimental results show that kngMap has higher sensitivity and can produce consecutive alignments for the whole read.

FRONTIERS IN GENETICS (2022)

Add to Collection

Article Biology

BiSulfite Bolt: A bisulfite sequencing analysis platform

Colin Farrell, Michael Thompson, Anela Tosevska, Adewale Oyetunde, Matteo Pellegrini

Summary: BSBoIt is a fast and scalable bisulfite sequencing analysis platform that efficiently handles asymmetrical libraries. Compared to existing bisulfite alignment tools, BSBoIt outperforms in terms of alignment and methylation calling accuracy.

GIGASCIENCE (2021)

Add to Collection

Article Oncology

Novel insights into systemic sclerosis using a sensitive computational method to analyze whole-genome bisulfite sequencing data

Jeffrey C. Y. Yu, Yixiao Zeng, Kaiqiong Zhao, Tianyuan Lu, Kathleen Oros Klein, Ines Colmegna, Maximilien Lora, Sahir R. Bhatnagar, Andrew Leask, Celia M. T. Greenwood, Marie Hudson

Summary: Using the SOMNiBUS method to re-analyze WGBS data, 131 differentially methylated regions and 125 differentially methylated genes were identified. This method provides a better understanding of the pathogenesis of systemic sclerosis.

CLINICAL EPIGENETICS (2023)

Add to Collection

Article Biochemical Research Methods

msPIPE: a pipeline for the analysis and visualization of whole-genome bisulfite sequencing data

Heesun Kim, Mikang Sim, Nayoung Park, Kisang Kwon, Junyoung Kim, Jaebum Kim

Summary: msPIPE is an end-to-end pipeline for DNA methylation analyses, allowing seamless connection of all necessary tasks from data pre-processing to downstream DNA methylation analyses. It generates various methylation profiles for analyzing methylation patterns, including statistical summaries and methylation levels. The pipeline also computes methylation levels in functional regions of the genome with proper annotation. The results can be visualized in high-quality figures. msPIPE can be easily used with a Docker image that includes all required packages and software for DNA methylation analyses.

BMC BIOINFORMATICS (2022)

Add to Collection

Article Clinical Neurology

Fine-mapping and replication of EWAS loci harboring putative epigenetic alterations associated with AD neuropathology in a large collection of human brain tissue samples

Helena Palma-Gudiel, Lei Yu, Zhiguang Huo, Jingyun Yang, Yanling Wang, Tongjun Gu, Cheng Gao, Philip L. De Jager, Peng Jin, David A. Bennett, Jinying Zhao

Summary: This study identified 130 CpG sites associated with Alzheimer's disease pathology, including 93 novel sites, through targeted sequencing. The DNA methylation at these sites was found to be associated with the expression of nearby genes.

ALZHEIMERS & DEMENTIA (2023)

Add to Collection

Article Biotechnology & Applied Microbiology

A pipeline for sample tagging of whole genome bisulfite sequencing data using genotypes of whole genome sequencing

Zhe Xu, Si Cheng, Xin Qiu, Xiaoqi Wang, Qiuwen Hu, Yanfeng Shi, Yang Liu, Jinxi Lin, Jichao Tian, Yongfei Peng, Yong Jiang, Yadong Yang, Jianwei Ye, Yilong Wang, Xia Meng, Zixiao Li, Hao Li, Yongjun Wang

Summary: This study constructed an optimized pipeline and identified applicable fingerprint panels to address the sample tagging problem in whole genome bisulfite sequencing (WGBS) data. By using autosome-wide A/T polymorphic single nucleotide variants (SNVs), a fingerprint panel was designed and genotypes were called from the WGBS data. The capability to tag WGBS data was validated and the lower boundary for the number of fingerprint genetic variants needed for correct sample tagging was determined.

BMC GENOMICS (2023)

Add to Collection

Article Biochemistry & Molecular Biology

A novel workflow for the qualitative analysis of DNA methylation data

Antonella Sarnataro, Giulia De Riso, Sergio Cocozza, Antonio Pezone, Barbara Majello, Stefano Amente, Giovanni Scala

Summary: DNA methylation is an important epigenetic modification that influences gene regulation, genomic imprinting, and genome stability. EpiStatProfiler is an R package that allows the analysis of CpG and non-CpG epialleles based on bisulfite sequencing data, with additional features for genomic annotation and analysis. It is the first package to provide functionalities specifically for epiallele composition analysis from any type of bisulfite sequencing experiment.

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2022)

Add to Collection

Article Biochemistry & Molecular Biology

Invertebrate methylomes provide insight into mechanisms of environmental tolerance and reveal methodological biases

Shelly A. Trigg, Yaamini R. Venkataraman, Mackenzie R. Gavery, Steven B. Roberts, Debashish Bhattacharya, Alan Downey-Wall, Jose M. Eirin-Lopez, Kevin M. Johnson, Katie E. Lotterhos, Jonathan B. Puritz, Hollie M. Putnam

Summary: This study compares three methods for quantifying DNA methylation and finds higher methylation in two coral species, primarily located in gene bodies and flanking regions. Each method has its advantages and disadvantages in detecting CpGs, and the relative genome size affects the number and location of CpGs detected by each method.

MOLECULAR ECOLOGY RESOURCES (2022)

Add to Collection

Article Biotechnology & Applied Microbiology

Characterizing the properties of bisulfite sequencing data: maximizing power and sensitivity to identify between-group differences in DNA methylation

Dorothea Seiler Vellame, Isabel Castanho, Aisha Dahir, Jonathan Mill, Eilis Hannon

Summary: The combination of sodium bisulfite treatment with highly-parallel sequencing is commonly used to quantify DNA methylation levels. Factors such as read depth, sample size, and DNA methylation differences between groups all influence the power to detect differences. A tool called POWEREDBiSeq has been developed to predict study-specific power for identifying DNA methylation differences, taking into account read depth filtering parameters and sample size requirements.

BMC GENOMICS (2021)

Add to Collection

Article Biochemical Research Methods

Best Match Graphs With Binary Trees

David Schaller, Manuela Geiss, Marc Hellmuth, Peter F. F. Stadler

Summary: We propose a near-cubic algorithm to determine if Best match graphs (BMG) can be explained by a fully resolved gene tree and to construct such a tree. We prove that all binary trees are refinements of the unique binary-refinable tree (BRT) which is a significant refinement of the least resolved tree of a BMG. Additionally, we demonstrate the NP-completeness of editing an arbitrary vertex-colored graph to a binary-explainable BMG and provide an integer linear program formulation for this task.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Add to Collection

Article Biochemical Research Methods

Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs

Lisa Fiedler, Matthias Bernt, Martin Middendorf, Peter F. Stadler

Summary: This study presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, considering high substitution rates. The method uses a parallel program design and has been extensively tested for accuracy.

BMC BIOINFORMATICS (2023)

Add to Collection

Article Mathematics, Applied

Planar median graphs and cubesquare-graphs

Carsten R. Seemann, Vincent Moulton, Peter F. Stadler, Marc Hellmuth

Summary: Median graphs are connected graphs where there is a unique vertex belonging to the shortest paths between any three vertices. This paper presents new characterizations of planar median graphs by using forbidden subgraphs, isometric cycles, and subgraphs contained inside and outside of 4-cycles. These characterizations lead to a new definition of planar median graphs called cubesquare-graphs, and also provide an O(n log n)-time recognition algorithm for computing the decomposition of a planar median graph into cubes and square-graphs.

DISCRETE APPLIED MATHEMATICS (2023)

Add to Collection

Article Mathematics, Applied

Quasi-best match graphs

Annachiara Korchmaros, David Schaller, Marc Hellmuth, Peter F. Stadler

Summary: Quasi-best match graphs (qBMGs) are directed, properly vertex-colored graphs that generalize best match graphs and represent the evolutionary closest relatedness of genes in multiple species. They can be explained by rooted trees where each leaf corresponds to a vertex. Compared to best match graphs, qBMGs only represent best matches within a restricted phylogenetic distance. We provide characterizations of qBMGs, including polynomial-time recognition algorithms, and identify best match graphs as color-sink-free qBMGs. Additionally, two-colored qBMGs are characterized as directed graphs satisfying three simple local conditions.

DISCRETE APPLIED MATHEMATICS (2023)

Add to Collection

Article Mathematics

Injective Split Systems

M. Hellmuth, K. T. Huber, V. Moulton, G. E. Scholz, P. F. Stadler

Summary: This paper studies the properties of split systems, with a focus on injective split systems that can be used to represent symbolic tree maps. The authors prove the existence of an injective split system on any set X and provide a characterization for when a split system is injective. They also introduce related concepts such as injective dimension and provide upper and lower bounds for these dimensions. An important motivation for studying injective split systems is their application in representing three-way symbolic maps.

GRAPHS AND COMBINATORICS (2023)

Add to Collection

Article Biochemistry & Molecular Biology

Led-Seq: ligation-enhanced double-end sequence-based structure analysis of RNA

Tim Kolberg, Sarah von Loehneysen, Iuliia Ozerova, Karolin Wellner, Roland K. Hartmann, Peter F. Stadler, Mario Moerl

Summary: Structural analysis of RNA is important in understanding its function. Led-Seq is a new approach based on lead-induced cleavage, which allows investigation of both resulting cleavage products. It provides accurate information about cleavage sites and is an improved method for studying RNA structures in vivo.

NUCLEIC ACIDS RESEARCH (2023)

Add to Collection

Article Biochemical Research Methods

Local RNA folding revisited

Maria Waldl, Thomas Spicher, Ronny Lorenz, Irene K. Beckmann, Ivo L. Hofacker, Sarah Von Loehneysen, Peter F. Stadler

Summary: Most functional RNA elements in large transcripts are local, making local folding a useful approximation for predicting global structure. By averaging local structure predictions over multiple overlapping sequence windows, accuracy can be improved. Dynamic programming allows for efficient computation of these averages. This study presents a mathematical formalization that generalizes previous approaches to the local folding problem, demonstrating that correct Boltzmann samples can be obtained through local stochastic backtracing in McCaskill's algorithms rather than local folding recursions. The ViennaRNA package incorporates these new features to enhance support for local folding, enabling the computation of maximum expected accuracy structures from RNAplfold data and the quantification of sensitivity at individual sequence positions using a mutual information measure.

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (2023)

Add to Collection

Review Physics, Applied

SuperConga: An open-source framework for mesoscopic superconductivity

P. Holmvall, N. Wall Wennerdal, M. Hakansson, P. Stadler, O. Shevtsov, T. Lofwander, M. Fogelstrom

Summary: SuperConga is an open-source framework for simulating equilibrium properties of unconventional and ballistic singlet superconductors in 2D mesoscopic grains under a perpendicular external magnetic field at low temperatures. It is designed to be fast and easy to use, allowing research and real-time visualization without the need for a computer cluster. The framework utilizes the parallel computational power of modern graphics processing units and provides a user-friendly Python frontend for defining simulation parameters. It also includes tools for analyzing and visualizing results. The framework can be downloaded for free from and is accompanied by a comprehensive user manual with examples and tutorials.

APPLIED PHYSICS REVIEWS (2023)

Add to Collection

Article Health Care Sciences & Services

Toward a Systematic Assessment of Sex Differences in Cystic Fibrosis

Christiane Gaertner, Joerg Fallmann, Peter F. Stadler, Thorsten Kaiser, Sarah J. Berkemer

Summary: In this study, we analyze the expression differences in whole blood transcriptomics between female and male CF patients to determine the pathways related to sex-biased genes and evaluate their potential influence on sex-specific effects in CF patients.

JOURNAL OF PERSONALIZED MEDICINE (2023)

Add to Collection

Article Chemistry, Multidisciplinary

The six stages of the convergence of the periodic system to its final structure

Andres M. Bran, Peter F. Stadler, Juergen Jost, Guillermo Restrepo

Summary: The periodic system encodes order and similarity among chemical elements. The system has converged towards its current stable structure through six stages by analyzing the space between 1800 and 2021. Given the limited chemical possibilities and low diversity of the chemical space, the periodic system is expected to remain largely unchanged.

COMMUNICATIONS CHEMISTRY (2023)

Add to Collection

Article Biology

Clustering systems of phylogenetic networks

Marc Hellmuth, David Schaller, Peter F. Stadler

Summary: Rooted acyclic graphs play a crucial role in modeling different types of evolutionary processes, and there are correspondences between different classes of networks and their clustering systems, revealing the mutual dependencies among various network types.

THEORY IN BIOSCIENCES (2023)

Add to Collection

Article Multidisciplinary Sciences

Defining the landscape of circular RNAs in neuroblastoma unveils a global suppressive function of MYCN

Steffen Fuchs, Clara Danssmann, Filippos Klironomos, Annika Winkler, Joerg Fallmann, Louisa-Marie Kruetzfeldt, Annabell Szymansky, Julian Naderi, Stephan H. Bernhart, Laura Grunewald, Konstantin Helmsauer, Elias Rodriguez-Fos, Marieluise Kirchner, Philipp Mertins, Kathy Astrahantseff, Christin Suenkel, Joern Toedling, Fabienne Meggetto, Marc Remke, Peter F. Stadler, Patrick Hundsdoerfer, Hedwig E. Deubzer, Annette Kuenkele, Peter Lang, Joerg Fuchs, Anton G. Henssen, Angelika Eggert, Nikolaus Rajewsky, Falk Hertwig, Johannes H. Schulte

Summary: The authors investigate the role of circRNAs in cancer by sequencing the transcriptomes of 104 primary neuroblastomas. They find that MYCN amplification suppresses circRNA biogenesis, leading to the identification of circARID1A as a potential oncogenic circRNA in neuroblastoma.

NATURE COMMUNICATIONS (2023)

Add to Collection

Article Mathematical & Computational Biology

RNAcode_Web - Convenient identification of evolutionary conserved protein coding regions

John Anders, Peter F. Stadler

Summary: Differentiating regions with coding potential from non-coding regions is an important task in computational biology. RNAcode, a method that utilizes sequence conservation patterns, shows superior classification accuracy for short coding sequences compared to methods that rely on a single input sequence. However, obtaining suitable multiple sequence alignments can be tedious and challenging. To address this, a new web service called RNAcode_Web is introduced, which automates the process of collecting, selecting, and preparing homologous sequences from the NCBI database and constructing multiple sequence alignments needed for RNAcode input. This service simplifies the investigation of previously unannotated coding regions for non-expert users.

JOURNAL OF INTEGRATIVE BIOINFORMATICS (2023)

Add to Collection

Article Genetics & Heredity

Tailored machine learning models for functional RNA detection in genome-wide screens

Christopher Klapproth, Siegfried Zoetzsche, Felix Kuehnl, Joerg Fallmann, Peter F. Stadler, Sven Findeiss

Summary: This article introduces a software framework for in silico prediction of non-coding and protein-coding genetic loci, which allows for the alignment-based training, evaluation, and application of machine learning models with user-defined parameters. Instead of using the one-size-fits-all approach of pervasive in silico annotation pipelines, this framework focuses on the structured generation and evaluation of models based on arbitrary features and input data, aiming for stable and explainable results. Furthermore, the software package is applied to a full-genome screen of Drosophila melanogaster and evaluated against the well-known but less flexible program RNAz.

NAR GENOMICS AND BIOINFORMATICS (2023)

Add to Collection

Article Chemistry, Multidisciplinary

Comparison of Atom Maps

Marcos E. Gonzalez Laffitte, Nora Beier, Nico Domschke, Peter F. Stadler

Summary: The computation of reliable and chemically correct atom maps from educt/product pairs is a challenging task in cheminformatics. Various competing models have been developed and compared through extensive benchmarking studies. This study formalizes the equivalence of atom maps and demonstrates the use of Fujita's Imaginary Transition State for this purpose. Numerical experiments confirm the practical feasibility. The article also briefly discusses generalizations to subgraph matches, graph transformation rules, and multi-step reaction mechanisms.

MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY (2023)

Add to Collection

No Data Available

© Peeref 2019-2024. All rights reserved.