4.4 Article

Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models

Journal

JOURNAL OF STATISTICAL PHYSICS
Volume 142, Issue 6, Pages 1187-1205

Publisher

SPRINGER
DOI: 10.1007/s10955-010-0102-x

Keywords

Bioinformatics; Hidden Markov Models; One-dimensional statistical mechanics; Fisher information; Machine learning

Funding

  1. NIH [K25GM086909, R01HG03470]
  2. DARPA [HR0011-05-1-0057]
  3. NSF [PHY-0957573]

Ask authors/readers for more resources

Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the inverse statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Physics, Mathematical

Landauer in the Age of Synthetic Biology: Energy Consumption and Information Processing in Biochemical Networks

Pankaj Mehta, Alex H. Lang, David J. Schwab

JOURNAL OF STATISTICAL PHYSICS (2016)

Article Physics, Multidisciplinary

Thermodynamic Paradigm for Solution Demixing Inspired by Nuclear Transport in Living Cells

Ching-Hao Wang, Pankaj Mehta, Michael Elbaum

PHYSICAL REVIEW LETTERS (2017)

Review Physics, Multidisciplinary

Geometry and non-adiabatic response in quantum and classical systems

Michael Kolodrubetz, Dries Sels, Pankaj Mehta, Anatoli Polkovnikov

PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS (2017)

Article Cell & Tissue Engineering

Thyroid Progenitors Are Robustly Derived from Embryonic Stem Cells through Transient, Developmental Stage-Specific Overexpression of Nkx2-1

Keri Dame, Steven Cincotta, Alex H. Lang, Reeti M. Sanghrajka, Liye Zhang, Jinyoung Choi, Letty Kwok, Talitha Wilson, Maciej M. Kandula, Stefano Monti, Anthony N. Hollenberg, Pankaj Mehta, Darrell N. Kotton, Laertis Ikonomou

STEM CELL REPORTS (2017)

Article Multidisciplinary Sciences

Emergent simplicity in microbial community assembly

Joshua E. Goldford, Nanxi Lu, Djordje Bajic, Sylvie Estrela, Mikhail Tikhonov, Alicia Sanchez-Gorostiaga, Daniel Segre, Pankaj Mehta, Alvaro Sanchez

SCIENCE (2018)

Article Multidisciplinary Sciences

A minimal model for microbial biodiversity can reproduce experimentally observed ecological patterns

Robert Marsland, Wenping Cui, Pankaj Mehta

SCIENTIFIC REPORTS (2020)

Article Physics, Multidisciplinary

Effect of Resource Dynamics on Species Packing in Diverse Ecosystems

Wenping Cui, Robert Marsland, Pankaj Mehta

PHYSICAL REVIEW LETTERS (2020)

Article Multidisciplinary Sciences

Tregs self-organize into a computing ecosystem and implement a sophisticated optimization algorithm for mediating immune response

Robert Marsland, Owen Howell, Andreas Mayer, Pankaj Mehta

Summary: The study presents a biophysically realistic model of Treg-mediated self-tolerance, demonstrating the importance of Treg diversity in maintaining stability against fluctuations in self-antigen concentrations and the potential risk of autoimmune response with decreased Treg diversity.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2021)

Article Biochemical Research Methods

HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis

James Anibal, Alexandre G. Day, Erol Bahadiroglu, Liam O'Neil, Long Phan, Alec Peltekian, Amir Erez, Mariana Kaplan, Gregoire Altan-Bonnet, Pankaj Mehta

Summary: Data clustering plays a significant role in biomedical sciences, especially in the analysis of single-cell data. The new hierarchical density clustering algorithm (HAL-x) introduced in this report improves computational efficiency and achieves high accuracy in single cell classification. This algorithm is scalable, tunable, and rapid, providing a valuable tool for analyzing vast biological datasets.

PLOS COMPUTATIONAL BIOLOGY (2022)

Article Developmental Biology

scTOP: physics-inspired order parameters for cellular identification and visualization

Maria Yampolskaya, Michael J. Herriges, Laertis Ikonomou, Darrell N. Kotton, Pankaj Mehta

Summary: Advances in single-cell RNA sequencing have provided a new way to understand cellular identity. The scTOP method, a statistical and physics-inspired approach, accurately classifies cells, visualizes developmental trajectories, and evaluates engineered cells without feature selection or dimensional reduction. Its application on human and mouse datasets has demonstrated its power in characterizing cellular populations and differentiation.

DEVELOPMENT (2023)

Article Physics, Fluids & Plasmas

Bias-variance decomposition of overparameterized regression with random linear features

Jason W. Rocks, Pankaj Mehta

Summary: The bias-variance trade-off is a concept in classical statistics that describes how the complexity of a model affects its ability to make accurate predictions. This understanding needs to be revisited for overparameterized models, which have shown incredible predictive performance even with a large number of fit parameters. In this study, the authors analyze a simple overparameterized model and derive analytic expressions for various error metrics. They discover three phase transitions in the model and explain them using random matrix theory.

PHYSICAL REVIEW E (2022)

Article Physics, Multidisciplinary

Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models

Jason W. Rocks, Pankaj Mehta

Summary: The bias-variance trade-off is a central concept in supervised learning. Modern deep learning methods challenge the traditional belief by achieving state-of-the-art performance with overparameterized models. Statistical physics methods help to understand the bias and variance in these models.

PHYSICAL REVIEW RESEARCH (2022)

Article Biochemistry & Molecular Biology

Cellular reprogramming dynamics follow a simple 1D reaction coordinate

Sai Teja Pusuluri, Alex H. Lang, Pankaj Mehta, Horacio E. Castillo

PHYSICAL BIOLOGY (2018)

Article Physics, Fluids & Plasmas

Analytically tractable model for community ecology with many species

Benjamin Dickens, Charles K. Fisher, Pankaj Mehta

PHYSICAL REVIEW E (2016)

No Data Available