☆ 4.5 Review

The impact of AlphaFold Protein Structure Database on the fields of life sciences

PROTEOMICS (2023)

期刊

PROTEOMICS

卷 23, 期 17, 页码 -

出版社

WILEY

DOI: 10.1002/pmic.202200128

关键词

AlphaFold; protein structure prediction

类别

Biochemical Research Methods Biochemistry & Molecular Biology

向作者/读者索取更多资源

Protocol

Reagent

智能总结 New
摘要

2020 was a year of high-accuracy protein structure predictions, with AlphaFold 2.0 achieving unprecedented accuracy. In 2021, the AlphaFold Protein Structure Database was developed to provide easy access to a large number of reliable protein structure predictions for the scientific community, impacting data services, bioinformatics, structural biology, and drug discovery.

Arguably, 2020 was the year of high-accuracy protein structure predictions, with AlphaFold 2.0 achieving previously unseen accuracy in the Critical Assessment of Protein Structure Prediction (CASP). In 2021, DeepMind and EMBL-EBI developed the AlphaFold Protein Structure Database to make an unprecedented number of reliable protein structure predictions easily accessible to the broad scientific community.We provide a brief overview and describe the latest developments in the AlphaFold database. We highlight how the fields of data services, bioinformatics, structural biology, and drug discovery are directly affected by the influx of protein structure data. We also show examples of cutting-edge research that took advantage of the AlphaFold database. It is apparent that connections between various fields through protein structures are now possible, but the amount of data poses new challenges.Finally, we give an outlook regarding the future direction of the database, both in terms of data sets and new functionalities.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5

评分不足

次要评分

新颖性

-

重要性

-

科学严谨性

-

评价这篇论文

推荐

Article Biochemical Research Methods

Improved the heterodimer protein complex prediction with protein language models

Bo Chen, Ziwei Xie, Jiezhong Qiu, Zhaofeng Ye, Jinbo Xu, Jie Tang

Summary: ESMPair is a novel method that utilizes protein language models to identify interacting homologs of a complex, producing better results than the default multiple sequence alignment method used in AlphaFold-Multimer. It significantly improves complex structure prediction accuracy, especially for complexes with low confidence.

BRIEFINGS IN BIOINFORMATICS (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences

Mihaly Varadi, Damian Bertoni, Paulyna Magana, Urmila Paramval, Ivanna Pidruchna, Malarvizhi Radhakrishnan, Maxim Tsenkov, Sreenath Nair, Milot Mirdita, Jingi Yeo, Oleg Kovalevskiy, Kathryn Tunyasuvunakool, Agata Laydon, Augustin Zidek, Hamish Tomlinson, Dhavanthi Hariharan, Josh Abrahamson, Tim Green, John Jumper, Ewan Birney, Martin Steinegger, Demis Hassabis, Sameer Velankar

Summary: The AlphaFold Database Protein Structure Database (AlphaFold DB) has expanded significantly since its initial release in 2021, now containing over 214 million predicted protein structures. Powered by the AlphaFold2 artificial intelligence (AI) system, the database has integrated its predictions into primary data resources such as PDB, UniProt, Ensembl, InterPro, and MobiDB. This manuscript details the enhancements made to data archiving, including the addition of model organisms, global health proteomes, Swiss-Prot integration, and curated protein datasets. The access mechanisms of AlphaFold DB, from direct file access to advanced queries using Google Cloud Public Datasets, are also discussed, along with improvements and added services since its release, such as enhancements to the Predicted Aligned Error viewer and the 3D viewer customization options.

NUCLEIC ACIDS RESEARCH (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

An automated pipeline integrating AlphaFold 2 and MODELLER for protein structure prediction

Fabio Hernan Gil Zuluaga, Nancy D'Arminio, Francesco Bardozzo, Roberto Tagliaferri, Anna Marabotti

Summary: In this study, a novel pipeline called AlphaMod was developed to improve the three-dimensional protein predictions of AlphaFold2. AlphaMod incorporates AlphaFold2 with MODELLER and enables comprehensive quality assessment of protein structures. The results showed that AlphaMod achieved higher accuracy compared to AlphaFold2 in both unsupervised and supervised setups.

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

Benchmarking the Accuracy of AlphaFold 2 in Loop Structure Prediction

Amy O. Stevens, Yi He

Summary: The inhibition of protein-protein interactions is a growing strategy in drug development, and protein loop regions are potential drug targets. AlphaFold 2 performs well in predicting protein loop structures, especially for short loops. However, as the length of the loop increases, the accuracy of AlphaFold 2's prediction decreases.

BIOMOLECULES (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

Assessment of AI-Based Protein Structure Prediction for the NLRP3 Target

Jian Yin, Junkun Lei, Jialin Yu, Weiren Cui, Alexander L. Satz, Yifan Zhou, Hua Feng, Jason Deng, Wenji Su, Letian Kuai

Summary: This study evaluates the reliability of AI-based models in reproducing the three-dimensional structures of protein-ligand complexes and finds that AI-predicted protein structures combined with molecular dynamics simulations offer a promising approach in small-molecule drug discovery.

MOLECULES (2022)

添加到收藏夹

Article Multidisciplinary Sciences

Peptide-binding specificity prediction using fine-tuned protein structure prediction networks

Amir Motmaen, Justas Dauparas, Minkyung Baek, Mohamad H. Abedi, David Baker, Philip Bradley

Summary: This study develops a model for predicting peptide-binding proteins and peptide-MHC interactions by adding a classifier on top of the AlphaFold network. The model shows strong generalization and excellent performance.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2023)

添加到收藏夹

Article Microbiology

Characterization of Klebsiella pneumoniae bacteriophages, KP1 and KP12, with deep learning-based structure prediction

Youngju Kim, Sang-Mok Lee, Linh Khanh Nong, Jaehyung Kim, Seung Bum Kim, Donghyuk Kim

Summary: Concerns about Klebsiella pneumoniae resistance to last-line antibiotics have led to a reconsideration of phage therapy in public health. This study sequenced, annotated, characterized, and compared two Klebsiella phages, KP1 and KP12, and found that they exhibit stable activity and a broad intraspecies host range. The phages are distantly related and contain phage lytic proteins that could be used for phage therapy against K. pneumoniae pathogens.

FRONTIERS IN MICROBIOLOGY (2023)

添加到收藏夹

Review Microbiology

Structure-guided metagenome mining to tap microbial functional diversity

Serina L. Robinson

Summary: Researchers are using protein structural features to explore metagenomes from various environments and study antibiotic resistance, nutrient cycling, and host-drug-microbe interactions. They urge the scientific community to move beyond global sequence and structure alignments and instead focus on fine-grained descriptors to understand the microbiome better.

CURRENT OPINION IN MICROBIOLOGY (2023)

添加到收藏夹

Review Biochemistry & Molecular Biology

Next Generation Protein Structure Predictions and Genetic Variant Interpretation

Gaurav D. Diwan, Juan Carlos Gonzalez-Sanchez, Gordana Apic, Robert B. Russell

Summary: The necessity to interpret genetic variants in terms of pathology or biological mechanism is urgent, with many insights into protein function impacted by genetic changes obtainable from three-dimensional structures. The development of precise methods, like Alphafold2, to predict structures from amino acid sequences may greatly benefit those seeking to understand genetic changes. This paper examines the current state of protein structures known for human and other proteomes, as well as the potential impact of Alphafold2 on variant interpretation efforts, suggesting that the available structural data for the human proteome may have a smaller impact on interpretation than anticipated. Additional efforts in structure prediction are also discussed for aiding the understanding of genetic variants.

JOURNAL OF MOLECULAR BIOLOGY (2021)

添加到收藏夹

Article Biochemistry & Molecular Biology

Obtaining protein foldability information from computational models of AlphaFold2 and RoseTTAFold

Sen Liu, Kan Wu, Cheng Chen

Summary: The computational models from AlphaFold2 and RoseTTAFold can provide information about protein foldability, as indicated by the correlation between RMSD values and protein foldability. This correlation is independent of secondary structures and protein functions.

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

The Difference in Structural States between Canonical Proteins and Their Isoforms Established by Proteome-Wide Bioinformatics Analysis

Zarifa Osmanli, Theo Falgarone, Turkan Samadova, Gudrun Aldrian, Jeremy Leclercq, Ilham Shahmuradov, Andrey Kajava

Summary: Alternative splicing is an important mechanism for generating protein diversity in cells. However, there is still a lack of structural data on alternative protein isoforms, as experimental studies typically focus on canonical proteins. In recent years, advances in bioinformatics tools and the development of the AlphaFold program have allowed for the modeling of high-confidence structures of isoforms. In this study, in silico analysis of 58 eukaryotic proteomes was performed, revealing differences in signal peptides, transmembrane regions, and tandem repeat regions between isoforms and canonical counterparts, potentially impacting protein function and cellular localization.

BIOMOLECULES (2022)

添加到收藏夹

Review Biochemical Research Methods

State-of-the-art web services for de novo protein structure prediction

Luciano A. Abriata, Matteo Dal Peraro

Summary: Coupling residue coevolution estimations with machine learning methods is transforming protein structure prediction, particularly for proteins without clear homologous templates like in the recent CASP competition. However, making these advances accessible to non-experts and ensuring correct interpretation of predicted models still requires further development of web resources and tools.

BRIEFINGS IN BIOINFORMATICS (2021)

添加到收藏夹

Review Biotechnology & Applied Microbiology

Molecular Insights into Lipoxygenases in Diatoms Based on Structure Prediction: a Pioneering Study on Lipoxygenases Found in Pseudo-nitzschia arenysensis and Fragilariopsis cylindrus

Yoshiaki Maeda, Tsuyoshi Tanaka

Summary: This review article discusses the potential functions and structures of diatom LOXs, which are enzymes involved in the production of oxygenated fatty acids. Although the structures of the diatom LOXs have not been determined, computational tools based on deep learning technology were used to predict their structures and study their functions. It was found that the diatom LOXs have wide substrate-binding pockets. However, further research is needed to fully understand the enzymology of these LOXs.

MARINE BIOTECHNOLOGY (2022)

添加到收藏夹

Article Biochemistry & Molecular Biology

Critical assessment of methods of protein structure prediction (CASP)-Round XIV

Andriy Kryshtafovych, Torsten Schwede, Maya Topf, Krzysztof Fidelis, John Moult

Summary: CASP is a community experiment aimed at advancing methods for computing three-dimensional protein structure, including rigorous blind testing and evaluation by independent assessors. In the recent CASP14 experiment, deep-learning methods from one research group consistently delivered computed structures rivaling the corresponding experimental ones in accuracy. These results represent a solution to the classical protein-folding problem, at least for single proteins.

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS (2021)

添加到收藏夹

Article Genetics & Heredity

Evaluation of AlphaFold structure-based protein stability prediction on missense variations in cancer

Hilal Keskin Karakoyun, Sirin K. Yuksel, Ilayda Amanoglu, Lara Naserikhojasteh, Ahmet Yesilyurt, Cengiz Yakicier, Emel Timucin, Cemaliye. B. Akyerli

Summary: This study provides the first structural analysis of 26 hereditary cancer genes, showing that the thermodynamic stability predicted from AlphaFold AF2 structures and the confidence score of AF2 can effectively predict the pathogenicity of variants.

FRONTIERS IN GENETICS (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022

Matthew Thakur, Alex Bateman, Cath Brooksbank, Mallory Freeberg, Melissa Harrison, Matthew Hartley, Thomas Keane, Gerard Kleywegt, Andrew Leach, Mariia Levchenko, Sarah Morgan, Ellen M. McDonagh, Sandra Orchard, Irene Papatheodorou, Sameer Velankar, Juan Antonio Vizcaino, Rick Witham, Barbara Zdrazil, Johanna McEntyre

Summary: EMBL-EBI is one of the world's leading sources of public biomolecular data, offering sustainable, high-quality data that can serve as training sets for deep learning and artificial intelligence applications. The open availability of their extensive curated databases makes them ideal for research in the life sciences.

NUCLEIC ACIDS RESEARCH (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

The opportunities and challenges posed by the new generation of deep learning-based protein structure predictors

Mihaly Varadi, Nicola Bordin, Christine Orengo, Sameer Velankar

Summary: The function of proteins can be inferred from their three-dimensional structures. The advent of deep learning-based protein structure prediction tools in the early 2020s has had a significant impact on the field of life sciences. These tools offer new opportunities and challenges to the scientific community, and there are potential directions for the future of computational protein structure prediction.

CURRENT OPINION IN STRUCTURAL BIOLOGY (2023)

添加到收藏夹

Article Biochemical Research Methods

The CCP4 suite: integrative software for macromolecular crystallography

Jon Agirre, Mihaela Atanasova, Haroldas Bagdonas, Charles B. Ballard, Arnaud Basle, James Beilsten-Edmands, Rafael J. Borges, David G. Brown, J. Javier Burgos-Marmol, John M. Berrisford, Paul S. Bond, Iracema Caballero, Lucrezia Catapano, Grzegorz Chojnowski, Atlanta G. Cook, Kevin D. Cowtan, Tristan I. Croll, Judit E. Debreczeni, Nicholas E. Devenish, Eleanor J. Dodson, Tarik R. Drevon, Paul Emsley, Gwyndaf Evans, Phil R. Evans, Maria Fando, James Foadi, Luis Fuentes-Montero, Elspeth F. Garman, Markus Gerstel, Richard J. Gildea, Kaushik Hatti, Maarten L. Hekkelman, Philipp Heuser, Soon Wen Hoh, Michael A. Hough, Huw T. Jenkins, Elisabet Jimenez, Robbie P. Joosten, Ronan M. Keegan, Nicholas Keep, Eugene B. Krissinel, Petr Kolenko, Oleg Kovalevskiy, Victor S. Lamzin, David M. Lawson, Andrey A. Lebedev, Andrew G. W. Leslie, Bernhard Lohkamp, Fei Long, Martin Maly, Airlie J. McCoy, Stuart J. McNicholas, Ana Medina, Claudia Millan, James W. Murray, Garib N. Murshudov, Robert A. Nicholls, Martin E. M. Noble, Robert Oeffner, Navraj S. Pannu, James M. Parkhurst, Nicholas Pearce, Joana Pereira, Anastassis Perrakis, Harold R. Powell, Randy J. Read, Daniel J. Rigden, William Rochira, Massimo Sammito, Filomeno Sanchez Rodriguez, George M. Sheldrick, Kathryn L. Shelley, Felix Simkovic, Adam J. Simpkin, Pavol Skubak, Egor Sobolev, Roberto A. Steiner, Kyle Stevenson, Ivo Tews, Jens M. H. Thomas, Andrea Thorn, Josep Trivino Valls, Ville Uski, Isabel Uson, Alexei Vagin, Sameer Velankar, Melanie Vollmar, Helen Walden, David Waterman, Keith S. Wilson, Martyn D. Winn, Graeme Winter, Marcin Wojdyr, Keitaro Yamashita

Summary: The Collaborative Computational Project No. 4 (CCP4) is an international collective led by the UK, dedicated to the development, testing, distribution, and promotion of software for macromolecular crystallography. The CCP4 suite is a multiplatform collection of programs, unified by familiar execution routines, common libraries, and graphical interfaces. This article serves as a general literature citation for the use of the CCP4 software suite, providing an overview of its recent changes, new features, and future developments, while also highlighting the individual programs within the suite and providing up-to-date references for crystallographers worldwide.

ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY (2023)

添加到收藏夹

Article Multidisciplinary Sciences

Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data

Preeti Choudhary, Stephen Anyango, John Berrisford, James Tolchard, Mihaly Varadi, Sameer Velankar

Summary: More than 61,000 proteins have correspondence between their amino acid sequence (UniProtKB) and 3D structures (PDB) through the SIFTS resource. SIFTS incorporates residue-level annotations from various biological resources and is maintained separately from the structure data in the PDB archive.

SCIENTIFIC DATA (2023)

添加到收藏夹

Article Biology

AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms

Nicola Bordin, Ian Sillitoe, Vamsi Nallapareddy, Clemens Rauer, Su Datt Lam, Vaishali P. Waman, Neeladri Sen, Michael Heinzinger, Maria Littmann, Stephanie Kim, Sameer Velankar, Martin Steinegger, Burkhard Rost, Christine Orengo

Summary: Deep-learning methods, such as DeepMind's AlphaFold2 (AF2), have significantly improved protein structure prediction. In this study, we used a new classification protocol (CATH-Assign) that utilizes novel DL methods for structural comparison and classification to analyze confident AF2 models from 21 model organisms. Our analysis revealed that 92% of the models could be assigned to existing superfamilies in the CATH domain superfamily classification, with the remaining models clustering into 2367 putative novel superfamilies. Detailed manual analysis of a subset of these models confirmed 25 novel superfamilies and identified remote homologies and unusual features. The expansion of CATH by AF2 domains is valuable for understanding structure-function relationships.

COMMUNICATIONS BIOLOGY (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

ModelCIF: An Extension of PDBx/mmCIF Data Representation for Computed Structure Models

Brinda Vallat, Gerardo Tauriello, Stefan Bienert, Juergen Haas, Benjamin M. Webb, Augustin Zidek, Wei Zheng, Ezra Peisach, Dennis W. Piehl, Ivan Anischanka, Ian Sillitoe, James Tolchard, Mihaly Varadi, David Baker, Christine Orengo, Yang Zhang, Jeffrey C. Hoch, Genji Kurisu, Ardan Patwardhan, Sameer Velankar, Stephen K. Burley, Andrej Sali, Torsten Schwede, Helen M. Berman, John D. Westbrook

Summary: ModelCIF is a data framework developed by computational structural biologists to deliver FAIR data of macromolecular structures worldwide, describing their attributes and metadata and providing a representation for deposition, archiving, and dissemination. It is an extension of the PDBx/mmCIF framework, the standard for experimentally-determined 3D structures, and is managed by the wwPDB partnership.

JOURNAL OF MOLECULAR BIOLOGY (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

Challenges in bridging the gap between protein structure prediction and functional interpretation

Mihaly Varadi, Maxim Tsenkov, Sameer Velankar

Summary: The rapid evolution of protein structure prediction tools has made protein structural data more accessible. However, it is important to note that these predicted models are not validated. Challenges still exist in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations and open data sharing are crucial in overcoming these obstacles.

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS (2023)

添加到收藏夹

Letter Biochemical Research Methods

Announcing the launch of Protein Data Bank China as an Associate Member of the Worldwide Protein Data Bank Partnership

Wenqing Xu, Sameer Velankar, Ardan Patwardhan, Jeffrey C. Hoch, Stephen K. Burley, Genji Kurisu

Summary: The Protein Data Bank (PDB) is a global archive of atomic-level, three-dimensional structures of biological macromolecules. Recently, there has been an increase in new structure depositions from Asia. In 2022, Protein Data Bank China (PDBc) joined the Worldwide Protein Data Bank (wwPDB) as an Associate Member. This letter discusses the history of wwPDB, the mechanisms for adding new data centers, and the processes for incorporating PDBc into the partnership.

ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY (2023)

添加到收藏夹

Article Chemistry, Multidisciplinary

PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank

Ibrahim Roshan Kunnakkattu, Preeti Choudhary, Lukas Pravda, Nurul Nadzirin, Oliver S. Smart, Qi Yuan, Stephen Anyango, Sreenath Nair, Mihaly Varadi, Sameer Velankar

Summary: PDBe CCDUtils is a versatile toolkit for processing and analyzing small molecules from the PDB. It provides convenient methods for computation and retrieval, catering to the needs of researchers in fields such as cheminformatics and structural biology.

JOURNAL OF CHEMINFORMATICS (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

EMDB-the Electron Microscopy Data Bank

Jack Turner, Sanja Abbott, Neli Fonseca, Lucas Carrijo, Amudha Kumari Duraisamy, Osman Salih, Zhe Wang, Gerard J. Kleywegt, Kyle L. Morris, Ardan Patwardhan, Stephen K. Burley, Gregg Crichlow, Zukang Feng, Justin W. Flatt, Sutapa Ghosh, Brian P. Hudson, Catherine L. Lawson, Yuhe Liang, Ezra Peisach, Irina Persikova, Monica Sekharan, Chenghua Shao, Jasmine Young, Sameer Velankar, David Armstrong, Marcus Bage, Wesley Morellato Bueno, Genevieve Evans, Romana Gaborova, Sudakshina Ganguly, Deepti Gupta, Deborah Harrus, Ahsan Tanweer, Manju Bansal, Vetriselvi Rangannan, Genji Kurisu, Hasumi Cho, Yasuyo Ikegawa, Yumiko Kengaku, Ju Yaen Kim, Satomi Niwa, Junko Sato, Ayako Takuwa, Jian Yu, Jeffrey C. Hoch, Kumaran Baskaran, Wenqing Xu, Weizhe Zhang, Xiaodan Ma

Summary: This article provides an overview of the Electron Microscopy Data Bank (EMDB), its significance as a global public archive, and its current holdings, recent updates, and future plans.

NUCLEIC ACIDS RESEARCH (2023)

添加到收藏夹

Article Biochemistry & Molecular Biology

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences

Mihaly Varadi, Damian Bertoni, Paulyna Magana, Urmila Paramval, Ivanna Pidruchna, Malarvizhi Radhakrishnan, Maxim Tsenkov, Sreenath Nair, Milot Mirdita, Jingi Yeo, Oleg Kovalevskiy, Kathryn Tunyasuvunakool, Agata Laydon, Augustin Zidek, Hamish Tomlinson, Dhavanthi Hariharan, Josh Abrahamson, Tim Green, John Jumper, Ewan Birney, Martin Steinegger, Demis Hassabis, Sameer Velankar

Summary: The AlphaFold Database Protein Structure Database (AlphaFold DB) has expanded significantly since its initial release in 2021, now containing over 214 million predicted protein structures. Powered by the AlphaFold2 artificial intelligence (AI) system, the database has integrated its predictions into primary data resources such as PDB, UniProt, Ensembl, InterPro, and MobiDB. This manuscript details the enhancements made to data archiving, including the addition of model organisms, global health proteomes, Swiss-Prot integration, and curated protein datasets. The access mechanisms of AlphaFold DB, from direct file access to advanced queries using Google Cloud Public Datasets, are also discussed, along with improvements and added services since its release, such as enhancements to the Predicted Aligned Error viewer and the 3D viewer customization options.

NUCLEIC ACIDS RESEARCH (2023)

添加到收藏夹

暂无数据

© Peeref 2019-2024. All rights reserved.