4.7 Article

An equivariant Bayesian convolutional network predicts recombination hotspots and accurately resolves binding motifs

Journal

BIOINFORMATICS
Volume 35, Issue 13, Pages 2177-2184

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bty964

Keywords

-

Funding

  1. Wellcome Trust [090532/Z/09/Z, 203141/Z/16/Z]

Ask authors/readers for more resources

Motivation Convolutional neural networks (CNNs) have been tremendously successful in many contexts, particularly where training data are abundant and signal-to-noise ratios are large. However, when predicting noisily observed phenotypes from DNA sequence, each training instance is only weakly informative, and the amount of training data is often fundamentally limited, emphasizing the need for methods that make optimal use of training data and any structure inherent in the process. Results Here we show how to combine equivariant networks, a general mathematical framework for handling exact symmetries in CNNs, with Bayesian dropout, a version of Monte Carlo dropout suggested by a reinterpretation of dropout as a variational Bayesian approximation, to develop a model that exhibits exact reverse-complement symmetry and is more resistant to overtraining. We find that this model combines improved prediction consistency with better predictive accuracy compared to standard CNN implementations and state-of-art motif finders. We use our network to predict recombination hotspots from sequence, and identify binding motifs for the recombination-initiation protein PRDM9 previously unobserved in this data, which were recently validated by high-resolution assays. The network achieves a predictive accuracy comparable to that attainable by a direct assay of the H3K4me3 histone mark, a proxy for PRDM9 binding. Availability and implementation https://github.com/luntergroup/EquivariantNetworks

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Review Biochemistry & Molecular Biology

The Diversity and Molecular Evolution of B-Cell Receptors during Infection

Kenneth B. Hoehn, Anna Fowler, Gerton Lunter, Oliver G. Pybus

MOLECULAR BIOLOGY AND EVOLUTION (2016)

Article Genetics & Heredity

A Phylogenetic Codon Substitution Model for Antibody Lineages

Kenneth B. Hoehn, Gerton Lunter, Oliver G. Pybus

GENETICS (2017)

Article Genetics & Heredity

A Phylogenetic Codon Substitution Model for Antibody Lineages

Kenneth B. Hoehn, Gerton Lunter, Oliver G. Pybus

GENETICS (2017)

Article Biotechnology & Applied Microbiology

A high throughput screen for active human transposable elements

Erika M. Kvikstad, Paolo Piazza, Jenny C. Taylor, Gerton Lunter

BMC GENOMICS (2018)

Article Biochemical Research Methods

Haplotype matching in large cohorts using the Li and Stephens model

Gerton Lunter

BIOINFORMATICS (2019)

Article Multidisciplinary Sciences

Sequencing of human genomes with nanopore technology

Rory Bowden, Robert W. Davies, Andreas Heger, Alistair T. Pagnamenta, Mariateresa de Cesare, Laura E. Oikkonen, Duncan Parkes, Colin Freeman, Fatima Dhalla, Smita Y. Patel, Niko Popitsch, Camilla L. C. Ip, Hannah E. Roberts, Silvia Salatino, Helen Lockstone, Gerton Lunter, Jenny C. Taylor, David Buck, Michael A. Simpson, Peter Donnelly

NATURE COMMUNICATIONS (2019)

Article Multidisciplinary Sciences

Repertoire-wide phylogenetic models of B cell molecular evolution reveal evolutionary signatures of aging and vaccination

Kenneth B. Hoehn, Jason A. Vander Heiden, Julian Q. Zhou, Gerton Lunter, Oliver G. Pybus, Steven H. Kleinstein

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2019)

Article Biotechnology & Applied Microbiology

Inferring B cell specificity for vaccines using a Bayesian mixture model

Anna Fowler, Jacob D. Galson, Johannes Truck, Dominic F. Kelly, Gerton Lunter

BMC GENOMICS (2020)

Article Statistics & Probability

Efficient inference in state-space models through adaptive learning in online Monte Carlo expectation maximization

Donna Henderson, Gerton Lunter

COMPUTATIONAL STATISTICS (2020)

Article Biochemical Research Methods

DeepC: predicting 3D genome folding using megabase-scale transfer learning

Ron Schwessinger, Matthew Gosden, Damien Downes, Richard C. Brown, A. Marieke Oudelaar, Jelena Telenius, Yee Whye Teh, Gerton Lunter, Jim R. Hughes

NATURE METHODS (2020)

Article Multidisciplinary Sciences

Demographic inference from multiple whole genomes using a particle filter for continuous Markov jump processes

Donna Henderson, Sha (Joe) Zhu, Christopher B. Cole, Gerton Lunter

Summary: This study introduces a coalescent-with-recombination model to connect demography and genetics, using particle filters and Variational Bayes to infer unobserved genealogies in the genome. Through real and simulated genomes, it shows improved accuracy in inferring past population sizes and the potential for jointly analyzing multiple genomes under complex demographic models.

PLOS ONE (2021)

Article Biotechnology & Applied Microbiology

A unified haplotype-based method for accurate and comprehensive variant calling

Daniel P. Cooke, David C. Wedge, Gerton Lunter

Summary: Octopus is a variant caller that uses a polymorphic Bayesian genotyping model capable of modeling different experimental designs within a unified haplotype-aware framework. It accurately calls germline variants in individuals, including low-frequency somatic variations, while producing fewer false positives compared to other methods. Octopus also outputs realigned evidence BAM files to assist with validation and interpretation.

NATURE BIOTECHNOLOGY (2021)

Article Multidisciplinary Sciences

Short and long-read genome sequencing methodologies for somatic variant detection; genomic analysis of a patient with diffuse large B-cell lymphoma

Hannah E. Roberts, Maria Lopopolo, Alistair T. Pagnamenta, Eshita Sharma, Duncan Parkes, Lorne Lonie, Colin Freeman, Samantha J. L. Knight, Gerton Lunter, Helene Dreau, Helen Lockstone, Jenny C. Taylor, Anna Schuh, Rory Bowden, David Buck

Summary: Recent advances in long-read sequencing technology, such as the Oxford Nanopore Technologies PromethiON platform, have shown potential for detecting somatic variations with higher accuracy and sensitivity compared to short-read sequencing methods. However, the development of specialized algorithms is necessary to improve the specificity and precision of somatic variant calling, especially for structural variants.

SCIENTIFIC REPORTS (2021)

Article Biology

Multi Locus View: an extensible web-based tool for the analysis of genomic data.

Martin J. Sergeant, Jim R. Hughes, Lance Hentges, Gerton Lunter, Damien J. Downes, Stephen Taylor

Summary: The web-based tool Multi Locus View allows researchers to interact with genomics datasets at multiple scales. Users can browse, annotate, combine or analyze raw data on this platform. User-generated datasets can also be made public, increasing access to genomic data for the academic community.

COMMUNICATIONS BIOLOGY (2021)

No Data Available