4.5 Review

The impact of AlphaFold Protein Structure Database on the fields of life sciences

期刊

PROTEOMICS
卷 23, 期 17, 页码 -

出版社

WILEY
DOI: 10.1002/pmic.202200128

关键词

AlphaFold; protein structure prediction

向作者/读者索取更多资源

2020 was a year of high-accuracy protein structure predictions, with AlphaFold 2.0 achieving unprecedented accuracy. In 2021, the AlphaFold Protein Structure Database was developed to provide easy access to a large number of reliable protein structure predictions for the scientific community, impacting data services, bioinformatics, structural biology, and drug discovery.
Arguably, 2020 was the year of high-accuracy protein structure predictions, with AlphaFold 2.0 achieving previously unseen accuracy in the Critical Assessment of Protein Structure Prediction (CASP). In 2021, DeepMind and EMBL-EBI developed the AlphaFold Protein Structure Database to make an unprecedented number of reliable protein structure predictions easily accessible to the broad scientific community.We provide a brief overview and describe the latest developments in the AlphaFold database. We highlight how the fields of data services, bioinformatics, structural biology, and drug discovery are directly affected by the influx of protein structure data. We also show examples of cutting-edge research that took advantage of the AlphaFold database. It is apparent that connections between various fields through protein structures are now possible, but the amount of data poses new challenges.Finally, we give an outlook regarding the future direction of the database, both in terms of data sets and new functionalities.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemistry & Molecular Biology

EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022

Matthew Thakur, Alex Bateman, Cath Brooksbank, Mallory Freeberg, Melissa Harrison, Matthew Hartley, Thomas Keane, Gerard Kleywegt, Andrew Leach, Mariia Levchenko, Sarah Morgan, Ellen M. McDonagh, Sandra Orchard, Irene Papatheodorou, Sameer Velankar, Juan Antonio Vizcaino, Rick Witham, Barbara Zdrazil, Johanna McEntyre

Summary: EMBL-EBI is one of the world's leading sources of public biomolecular data, offering sustainable, high-quality data that can serve as training sets for deep learning and artificial intelligence applications. The open availability of their extensive curated databases makes them ideal for research in the life sciences.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

The opportunities and challenges posed by the new generation of deep learning-based protein structure predictors

Mihaly Varadi, Nicola Bordin, Christine Orengo, Sameer Velankar

Summary: The function of proteins can be inferred from their three-dimensional structures. The advent of deep learning-based protein structure prediction tools in the early 2020s has had a significant impact on the field of life sciences. These tools offer new opportunities and challenges to the scientific community, and there are potential directions for the future of computational protein structure prediction.

CURRENT OPINION IN STRUCTURAL BIOLOGY (2023)

Article Biochemical Research Methods

The CCP4 suite: integrative software for macromolecular crystallography

Jon Agirre, Mihaela Atanasova, Haroldas Bagdonas, Charles B. Ballard, Arnaud Basle, James Beilsten-Edmands, Rafael J. Borges, David G. Brown, J. Javier Burgos-Marmol, John M. Berrisford, Paul S. Bond, Iracema Caballero, Lucrezia Catapano, Grzegorz Chojnowski, Atlanta G. Cook, Kevin D. Cowtan, Tristan I. Croll, Judit E. Debreczeni, Nicholas E. Devenish, Eleanor J. Dodson, Tarik R. Drevon, Paul Emsley, Gwyndaf Evans, Phil R. Evans, Maria Fando, James Foadi, Luis Fuentes-Montero, Elspeth F. Garman, Markus Gerstel, Richard J. Gildea, Kaushik Hatti, Maarten L. Hekkelman, Philipp Heuser, Soon Wen Hoh, Michael A. Hough, Huw T. Jenkins, Elisabet Jimenez, Robbie P. Joosten, Ronan M. Keegan, Nicholas Keep, Eugene B. Krissinel, Petr Kolenko, Oleg Kovalevskiy, Victor S. Lamzin, David M. Lawson, Andrey A. Lebedev, Andrew G. W. Leslie, Bernhard Lohkamp, Fei Long, Martin Maly, Airlie J. McCoy, Stuart J. McNicholas, Ana Medina, Claudia Millan, James W. Murray, Garib N. Murshudov, Robert A. Nicholls, Martin E. M. Noble, Robert Oeffner, Navraj S. Pannu, James M. Parkhurst, Nicholas Pearce, Joana Pereira, Anastassis Perrakis, Harold R. Powell, Randy J. Read, Daniel J. Rigden, William Rochira, Massimo Sammito, Filomeno Sanchez Rodriguez, George M. Sheldrick, Kathryn L. Shelley, Felix Simkovic, Adam J. Simpkin, Pavol Skubak, Egor Sobolev, Roberto A. Steiner, Kyle Stevenson, Ivo Tews, Jens M. H. Thomas, Andrea Thorn, Josep Trivino Valls, Ville Uski, Isabel Uson, Alexei Vagin, Sameer Velankar, Melanie Vollmar, Helen Walden, David Waterman, Keith S. Wilson, Martyn D. Winn, Graeme Winter, Marcin Wojdyr, Keitaro Yamashita

Summary: The Collaborative Computational Project No. 4 (CCP4) is an international collective led by the UK, dedicated to the development, testing, distribution, and promotion of software for macromolecular crystallography. The CCP4 suite is a multiplatform collection of programs, unified by familiar execution routines, common libraries, and graphical interfaces. This article serves as a general literature citation for the use of the CCP4 software suite, providing an overview of its recent changes, new features, and future developments, while also highlighting the individual programs within the suite and providing up-to-date references for crystallographers worldwide.

ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY (2023)

Article Multidisciplinary Sciences

Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data

Preeti Choudhary, Stephen Anyango, John Berrisford, James Tolchard, Mihaly Varadi, Sameer Velankar

Summary: More than 61,000 proteins have correspondence between their amino acid sequence (UniProtKB) and 3D structures (PDB) through the SIFTS resource. SIFTS incorporates residue-level annotations from various biological resources and is maintained separately from the structure data in the PDB archive.

SCIENTIFIC DATA (2023)

Article Biology

AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms

Nicola Bordin, Ian Sillitoe, Vamsi Nallapareddy, Clemens Rauer, Su Datt Lam, Vaishali P. Waman, Neeladri Sen, Michael Heinzinger, Maria Littmann, Stephanie Kim, Sameer Velankar, Martin Steinegger, Burkhard Rost, Christine Orengo

Summary: Deep-learning methods, such as DeepMind's AlphaFold2 (AF2), have significantly improved protein structure prediction. In this study, we used a new classification protocol (CATH-Assign) that utilizes novel DL methods for structural comparison and classification to analyze confident AF2 models from 21 model organisms. Our analysis revealed that 92% of the models could be assigned to existing superfamilies in the CATH domain superfamily classification, with the remaining models clustering into 2367 putative novel superfamilies. Detailed manual analysis of a subset of these models confirmed 25 novel superfamilies and identified remote homologies and unusual features. The expansion of CATH by AF2 domains is valuable for understanding structure-function relationships.

COMMUNICATIONS BIOLOGY (2023)

Article Biochemistry & Molecular Biology

ModelCIF: An Extension of PDBx/mmCIF Data Representation for Computed Structure Models

Brinda Vallat, Gerardo Tauriello, Stefan Bienert, Juergen Haas, Benjamin M. Webb, Augustin Zidek, Wei Zheng, Ezra Peisach, Dennis W. Piehl, Ivan Anischanka, Ian Sillitoe, James Tolchard, Mihaly Varadi, David Baker, Christine Orengo, Yang Zhang, Jeffrey C. Hoch, Genji Kurisu, Ardan Patwardhan, Sameer Velankar, Stephen K. Burley, Andrej Sali, Torsten Schwede, Helen M. Berman, John D. Westbrook

Summary: ModelCIF is a data framework developed by computational structural biologists to deliver FAIR data of macromolecular structures worldwide, describing their attributes and metadata and providing a representation for deposition, archiving, and dissemination. It is an extension of the PDBx/mmCIF framework, the standard for experimentally-determined 3D structures, and is managed by the wwPDB partnership.

JOURNAL OF MOLECULAR BIOLOGY (2023)

Article Biochemistry & Molecular Biology

Challenges in bridging the gap between protein structure prediction and functional interpretation

Mihaly Varadi, Maxim Tsenkov, Sameer Velankar

Summary: The rapid evolution of protein structure prediction tools has made protein structural data more accessible. However, it is important to note that these predicted models are not validated. Challenges still exist in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations and open data sharing are crucial in overcoming these obstacles.

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS (2023)

Letter Biochemical Research Methods

Announcing the launch of Protein Data Bank China as an Associate Member of the Worldwide Protein Data Bank Partnership

Wenqing Xu, Sameer Velankar, Ardan Patwardhan, Jeffrey C. Hoch, Stephen K. Burley, Genji Kurisu

Summary: The Protein Data Bank (PDB) is a global archive of atomic-level, three-dimensional structures of biological macromolecules. Recently, there has been an increase in new structure depositions from Asia. In 2022, Protein Data Bank China (PDBc) joined the Worldwide Protein Data Bank (wwPDB) as an Associate Member. This letter discusses the history of wwPDB, the mechanisms for adding new data centers, and the processes for incorporating PDBc into the partnership.

ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY (2023)

Article Chemistry, Multidisciplinary

PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank

Ibrahim Roshan Kunnakkattu, Preeti Choudhary, Lukas Pravda, Nurul Nadzirin, Oliver S. Smart, Qi Yuan, Stephen Anyango, Sreenath Nair, Mihaly Varadi, Sameer Velankar

Summary: PDBe CCDUtils is a versatile toolkit for processing and analyzing small molecules from the PDB. It provides convenient methods for computation and retrieval, catering to the needs of researchers in fields such as cheminformatics and structural biology.

JOURNAL OF CHEMINFORMATICS (2023)

Article Biochemistry & Molecular Biology

EMDB-the Electron Microscopy Data Bank

Jack Turner, Sanja Abbott, Neli Fonseca, Lucas Carrijo, Amudha Kumari Duraisamy, Osman Salih, Zhe Wang, Gerard J. Kleywegt, Kyle L. Morris, Ardan Patwardhan, Stephen K. Burley, Gregg Crichlow, Zukang Feng, Justin W. Flatt, Sutapa Ghosh, Brian P. Hudson, Catherine L. Lawson, Yuhe Liang, Ezra Peisach, Irina Persikova, Monica Sekharan, Chenghua Shao, Jasmine Young, Sameer Velankar, David Armstrong, Marcus Bage, Wesley Morellato Bueno, Genevieve Evans, Romana Gaborova, Sudakshina Ganguly, Deepti Gupta, Deborah Harrus, Ahsan Tanweer, Manju Bansal, Vetriselvi Rangannan, Genji Kurisu, Hasumi Cho, Yasuyo Ikegawa, Yumiko Kengaku, Ju Yaen Kim, Satomi Niwa, Junko Sato, Ayako Takuwa, Jian Yu, Jeffrey C. Hoch, Kumaran Baskaran, Wenqing Xu, Weizhe Zhang, Xiaodan Ma

Summary: This article provides an overview of the Electron Microscopy Data Bank (EMDB), its significance as a global public archive, and its current holdings, recent updates, and future plans.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences

Mihaly Varadi, Damian Bertoni, Paulyna Magana, Urmila Paramval, Ivanna Pidruchna, Malarvizhi Radhakrishnan, Maxim Tsenkov, Sreenath Nair, Milot Mirdita, Jingi Yeo, Oleg Kovalevskiy, Kathryn Tunyasuvunakool, Agata Laydon, Augustin Zidek, Hamish Tomlinson, Dhavanthi Hariharan, Josh Abrahamson, Tim Green, John Jumper, Ewan Birney, Martin Steinegger, Demis Hassabis, Sameer Velankar

Summary: The AlphaFold Database Protein Structure Database (AlphaFold DB) has expanded significantly since its initial release in 2021, now containing over 214 million predicted protein structures. Powered by the AlphaFold2 artificial intelligence (AI) system, the database has integrated its predictions into primary data resources such as PDB, UniProt, Ensembl, InterPro, and MobiDB. This manuscript details the enhancements made to data archiving, including the addition of model organisms, global health proteomes, Swiss-Prot integration, and curated protein datasets. The access mechanisms of AlphaFold DB, from direct file access to advanced queries using Google Cloud Public Datasets, are also discussed, along with improvements and added services since its release, such as enhancements to the Predicted Aligned Error viewer and the 3D viewer customization options.

NUCLEIC ACIDS RESEARCH (2023)

暂无数据