4.6 Article

Amino Acid Changes in Disease-Associated Variants Differ Radically from Variants Observed in the 1000 Genomes Project Dataset

期刊

PLOS COMPUTATIONAL BIOLOGY
卷 9, 期 12, 页码 -

出版社

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1003382

关键词

-

资金

  1. National Institutes of Health [GM094585]
  2. U.S. Department of Energy, Office of Biological and Environmental Research [DE-AC02-06CH11357]
  3. EMBL-EBI

向作者/读者索取更多资源

The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemistry & Molecular Biology

The Enzyme Portal: an integrative tool for enzyme information and analysis

Rossana Zaru, Joseph Onwubiko, Antonio J. M. Ribeiro, Keeva Cochrane, Jonathan D. Tyzack, Venkatesh Muthukrishnan, Lukas Pravda, Janet M. Thornton, Claire O'Donovan, Sameer Velanker, Sandra Orchard, Andrew Leach, Maria J. Martin

Summary: Enzyme Portal serves as a free hub for researchers to easily access and explore enzyme-related information from various resources, addressing the challenge of time-consuming retrieval of scattered enzyme data.

FEBS JOURNAL (2022)

Letter Biochemical Research Methods

Srinivasan (1962-2021) in Bioinformatics and beyond

M. Michael Gromiha, Christine A. Orengo, Ramanathan Sowdhamini, Janet M. Thornton

BIOINFORMATICS (2022)

Article Biochemical Research Methods

phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets

Nicola R. De Maio, William Boulton, Lukas Weilguny, Conor Walker, Yatish Turakhia, Russell O. Corbett-Detig, Nick Goldman, Ville Mustonen, Joel Wertheim

Summary: This article introduces a new algorithm and software for efficiently simulating a large number of closely related genomes. The algorithm is based on the Gillespie approach and utilizes an efficient multi-layered search tree structure to achieve high computational efficiency, allowing integration with various evolutionary models.

PLOS COMPUTATIONAL BIOLOGY (2022)

Correction Multidisciplinary Sciences

Genomic reconstruction of the SARS CoV-2 epidemic in England (vol 600, pg 506, 2021)

Harald S. Vohringer, Theo Sanderson, Matthew Sinnott, Nicola De Maio, Thuy Nguyen, Richard Goater, Frank Schwach, Ian Harrison, Joel Hellewell, Cristina V. Ariani, Sonia Goncalves, David K. Jackson, Ian Johnston, Alexander W. Jung, Callum Saint, John Sillitoe, Maria Suciu, Nick Goldman

NATURE (2022)

Article Biochemistry & Molecular Biology

Ensembl 2023

Fergal J. Martin, M. Ridwan Amode, Alisha Aneja, Olanrewaju Austine-Orimoloye, Andrey G. Azov, If Barnes, Arne Becker, Ruth Bennett, Andrew Berry, Jyothish Bhai, Simarpreet Kaur Bhurji, Alexandra Bignell, Sanjay Boddu, Paulo R. Branco Lins, Lucy Brooks, Shashank Budhanuru Ramaraju, Mehrnaz Charkhchi, Alexander Cockburn, Luca Da Rin Fiorretto, Claire Davidson, Kamalkumar Dodiya, Sarah Donaldson, Bilal El Houdaigui, Tamara El Naboulsi, Reham Fatima, Carlos Garcia Giron, Thiago Genez, Gurpreet S. Ghattaoraya, Jose Gonzalez Martinez, Cristi Guijarro, Matthew Hardy, Zoe Hollis, Thibaut Hourlier, Toby Hunt, Mike Kay, Vinay Kaykala, Tuan Le, Diana Lemos, Diego Marques-Coelho, Jose Carlos Marugan, Gabriela Alejandra Merino, Louisse Paola Mirabueno, Aleena Mushtaq, Syed Nakib Hossain, Denye N. Ogeh, Manoj Pandian Sakthivel, Anne Parker, Malcolm Perry, Ivana Pilizota, Irina Prosovetskaia, Jose G. Perez-Silva, Ahamed Imran Abdul Salam, Nuno Saraiva-Agostinho, Helen Schuilenburg, Dan Sheppard, Swati Sinha, Botond Sipos, William Stark, Emily Steed, Ranjit Sukumaran, Dulika Sumathipala, Marie-Marthe Suner, Likhitha Surapaneni, Kyosti Sutinen, Michal Szpak, Francesca Floriana Tricomi, David Urbina-Gomez, Andres Veidenberg, Thomas A. Walsh, Brandon Walts, Elizabeth Wass, Natalie Willhoft, Jamie Allen, Jorge Alvarez-Jarreta, Marc Chakiachvili, Bethany Flint, Stefano Giorgetti, Leanne Haggerty, Garth R. Ilsley, Jane E. Loveland, Benjamin Moore, Jonathan M. Mudge, John Tate, David Thybert, Stephen J. Trevanion, Andrea Winterbottom, Adam Frankish, Sarah E. Hunt, Magali Ruffier, Fiona Cunningham, Sarah Dyer, Robert D. Finn, Kevin L. Howe, Peter W. Harrison, Andrew D. Yates, Paul Flicek

Summary: Ensembl has been providing high-quality genomic resources for vertebrates and model organisms for over 20 years. With the increase in high-quality reference genomes and the development of pangenome representations, Ensembl aims to support downstream research by creating high-quality annotations, tools, and services for species across the tree of life. This report highlights Ensembl's resources for popular reference genomes, the growing annotations, updates to the Variant Effect Predictor, protein structure predictions, and the beta release of their new website.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

AlphaFold2 protein structure prediction: Implications for drug discovery

Neera Borkakoti, Janet M. Thornton

Summary: The drug discovery process involves designing compounds to selectively interact with their protein targets. Recent advancements in artificial intelligence have greatly improved the accuracy of protein structure prediction, making protein targets more accessible in the drug design process. In this perspective article, we highlight the importance of accurate protein structure prediction in various stages of small molecule drug discovery, discussing current capabilities and the potential impact of further evolution of predictive procedures.

CURRENT OPINION IN STRUCTURAL BIOLOGY (2023)

Article Biochemistry & Molecular Biology

Mapping the Constrained Coding Regions in the Human Genome to Their Corresponding Proteins

Marcia A. Hasenahuer, Alba Sanchis-Juan, Roman A. Laskowski, James A. Baker, James D. Stephenson, Christine A. Orengo, F. Lucy Raymond, Janet M. Thornton

Summary: In this study, constrained coding regions (CCRs) in the human genome were identified using DNA sequencing data from healthy control populations. These regions lack protein-changing variants and have been under constraint during human evolution. The distribution of CCRs was explored and their co-occurrence with different protein functional features was analyzed. Functional amino acids involved in DNA/RNA interactions, protein-protein contacts, and catalytic sites were found to be highly constrained. Surprisingly, linear motifs, linear interacting peptides, disorder-order transitions, and liquid-liquid phase separating regions also showed strong association with constraint for variability.

JOURNAL OF MOLECULAR BIOLOGY (2023)

Article Biotechnology & Applied Microbiology

Dynamic, adaptive sampling during nanopore sequencing using Bayesian experimental design

Lukas Weilguny, Nicola De Maio, Rory Munro, Charlotte Manser, Ewan Birney, Matthew Loose, Nick Goldman

Summary: BOSS-RUNS is an algorithmic framework and software that dynamically updates decision strategies based on real-time updates of uncertainty at each genome position. It optimizes information gain by deciding whether to fully sequence each DNA fragment, leading to improved variant calling in microbial communities.

NATURE BIOTECHNOLOGY (2023)

Article Biochemistry & Molecular Biology

A structural biology community assessment of AlphaFold2 applications

Mehmet Akdel, Douglas E. Pires, Eduard Porta Pardo, Jurgen Janes, Arthur O. Zalevsky, Balint Meszaros, Patrick Bryant, Lydia L. Good, Roman A. Laskowski, Gabriele Pozzati, Aditi Shenoy, Wensi Zhu, Petras Kundrotas, Victoria Ruiz Serra, Carlos H. M. Rodrigues, Alistair S. Dunham, David Burke, Neera Borkakoti, Sameer Velankar, Adam Frost, Jerome Basquin, Kresten Lindorff-Larsen, Alex Bateman, Andrey Kajava, Alfonso Valencia, Sergey Ovchinnikov, Janani Durairaj, David B. Ascher, Janet M. Thornton, Norman E. Davey, Amelie Stein, Arne Elofsson, Tristan Croll, Pedro Beltrao

Summary: This study evaluates the performance of AlphaFold2 in structural biology applications and finds that it performs well and can partially replace experimentally determined structures, which is of great significance for life science research.

NATURE STRUCTURAL & MOLECULAR BIOLOGY (2022)

Editorial Material Biochemistry & Molecular Biology

The mission to ensure continued funding for excellent basic research

Angus I. Lamond, Ivan Dikic, Andre Nussenzweig, Christoph W. Mueller, Janet M. Thornton, Michael B. Yaffe

EMBO REPORTS (2023)

Article Genetics & Heredity

Maximum likelihood pandemic-scale phylogenetics

Nicola De Maio, Prabhav Kalaghatgi, Yatish Turakhia, Russell Corbett-Detig, Bui Quang Minh, Nick Goldman

Summary: Phylogenetics plays a crucial role in genomic epidemiology, and the COVID-19 pandemic has generated an unprecedented amount of genome sequence data for analysis. However, most phylogenetic approaches are unable to handle the scale of these datasets. This study presents a new method called 'MAximum Parsimonious Likelihood Estimation' (MAPLE) for likelihood-based phylogenetic analysis of large genomic datasets. MAPLE is faster, more accurate, and requires significantly less memory compared to existing maximum likelihood methods, enabling the analysis of millions of genomes.

NATURE GENETICS (2023)

Correction Multidisciplinary Sciences

Temporal changes in the gene expression heterogeneity during brain development and aging (vol 10, 4080, 2020)

Ulas Isildak, Mehmet Somel, Janet M. Thornton, Handan Melike Donertas

SCIENTIFIC REPORTS (2023)

Article Biochemistry & Molecular Biology

The 3D Modules of Enzyme Catalysis: Deconstructing Active Sites into Distinct Functional Entities

Ioannis G. Riziotis, Antonio J. M. Ribeiro, Neera Borkakoti, Janet M. Thornton

Summary: Enzyme catalysis is controlled by a limited set of residues and co-factors. By utilizing three-dimensional templates, recurring catalytic modules that are involved in metal ion, co-factor, and substrate binding can be identified. Some of these convergent modules perform specific catalytic functions, while enzymes that have diverged during evolution retain specific regions of their active site.

JOURNAL OF MOLECULAR BIOLOGY (2023)

Article Biochemical Research Methods

EzMechanism: an automated tool to propose catalytic mechanisms of enzyme reactions

Antonio J. M. Ribeiro, Ioannis G. Riziotis, Jonathan D. Tyzack, Neera Borkakoti, Janet M. Thornton

Summary: The rich literature on enzyme reaction mechanisms can serve as the foundation for new knowledge-based approaches to investigate enzyme mechanisms. In this study, a tool called EzMechanism is presented, which can automatically infer mechanistic paths for a given three-dimensional active site and enzyme reaction based on catalytic rules compiled from a database of enzyme mechanisms. EzMechanism facilitates and improves the generation of hypotheses by considering relevant information derived from literature on both related and unrelated enzymes.

NATURE METHODS (2023)

Article Evolutionary Biology

DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies

Paschalia Kapli, Ioanna Kotari, Maximilian J. Telford, Nick Goldman, Ziheng Yang

Summary: Inference of deep phylogenies has primarily used protein sequences, but our analysis shows that DNA sequences may be just as useful and should not be excluded. We conducted a simulation study and analyzed empirical data, which suggest that DNA sequences can recover the correct tree as often as protein sequences. Using DNA data has computational advantages and allows for advanced models that account for heterogeneity in the nucleotide-substitution process.

SYSTEMATIC BIOLOGY (2023)

暂无数据