Article
Biochemical Research Methods
Bo Chen, Ziwei Xie, Jiezhong Qiu, Zhaofeng Ye, Jinbo Xu, Jie Tang
Summary: ESMPair is a novel method that utilizes protein language models to identify interacting homologs of a complex, producing better results than the default multiple sequence alignment method used in AlphaFold-Multimer. It significantly improves complex structure prediction accuracy, especially for complexes with low confidence.
BRIEFINGS IN BIOINFORMATICS
(2023)
Article
Biochemistry & Molecular Biology
Mihaly Varadi, Damian Bertoni, Paulyna Magana, Urmila Paramval, Ivanna Pidruchna, Malarvizhi Radhakrishnan, Maxim Tsenkov, Sreenath Nair, Milot Mirdita, Jingi Yeo, Oleg Kovalevskiy, Kathryn Tunyasuvunakool, Agata Laydon, Augustin Zidek, Hamish Tomlinson, Dhavanthi Hariharan, Josh Abrahamson, Tim Green, John Jumper, Ewan Birney, Martin Steinegger, Demis Hassabis, Sameer Velankar
Summary: The AlphaFold Database Protein Structure Database (AlphaFold DB) has expanded significantly since its initial release in 2021, now containing over 214 million predicted protein structures. Powered by the AlphaFold2 artificial intelligence (AI) system, the database has integrated its predictions into primary data resources such as PDB, UniProt, Ensembl, InterPro, and MobiDB. This manuscript details the enhancements made to data archiving, including the addition of model organisms, global health proteomes, Swiss-Prot integration, and curated protein datasets. The access mechanisms of AlphaFold DB, from direct file access to advanced queries using Google Cloud Public Datasets, are also discussed, along with improvements and added services since its release, such as enhancements to the Predicted Aligned Error viewer and the 3D viewer customization options.
NUCLEIC ACIDS RESEARCH
(2023)
Article
Biochemistry & Molecular Biology
Fabio Hernan Gil Zuluaga, Nancy D'Arminio, Francesco Bardozzo, Roberto Tagliaferri, Anna Marabotti
Summary: In this study, a novel pipeline called AlphaMod was developed to improve the three-dimensional protein predictions of AlphaFold2. AlphaMod incorporates AlphaFold2 with MODELLER and enables comprehensive quality assessment of protein structures. The results showed that AlphaMod achieved higher accuracy compared to AlphaFold2 in both unsupervised and supervised setups.
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL
(2023)
Article
Biochemistry & Molecular Biology
Amy O. Stevens, Yi He
Summary: The inhibition of protein-protein interactions is a growing strategy in drug development, and protein loop regions are potential drug targets. AlphaFold 2 performs well in predicting protein loop structures, especially for short loops. However, as the length of the loop increases, the accuracy of AlphaFold 2's prediction decreases.
Article
Biochemistry & Molecular Biology
Jian Yin, Junkun Lei, Jialin Yu, Weiren Cui, Alexander L. Satz, Yifan Zhou, Hua Feng, Jason Deng, Wenji Su, Letian Kuai
Summary: This study evaluates the reliability of AI-based models in reproducing the three-dimensional structures of protein-ligand complexes and finds that AI-predicted protein structures combined with molecular dynamics simulations offer a promising approach in small-molecule drug discovery.
Article
Multidisciplinary Sciences
Amir Motmaen, Justas Dauparas, Minkyung Baek, Mohamad H. Abedi, David Baker, Philip Bradley
Summary: This study develops a model for predicting peptide-binding proteins and peptide-MHC interactions by adding a classifier on top of the AlphaFold network. The model shows strong generalization and excellent performance.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
(2023)
Article
Microbiology
Youngju Kim, Sang-Mok Lee, Linh Khanh Nong, Jaehyung Kim, Seung Bum Kim, Donghyuk Kim
Summary: Concerns about Klebsiella pneumoniae resistance to last-line antibiotics have led to a reconsideration of phage therapy in public health. This study sequenced, annotated, characterized, and compared two Klebsiella phages, KP1 and KP12, and found that they exhibit stable activity and a broad intraspecies host range. The phages are distantly related and contain phage lytic proteins that could be used for phage therapy against K. pneumoniae pathogens.
FRONTIERS IN MICROBIOLOGY
(2023)
Review
Microbiology
Serina L. Robinson
Summary: Researchers are using protein structural features to explore metagenomes from various environments and study antibiotic resistance, nutrient cycling, and host-drug-microbe interactions. They urge the scientific community to move beyond global sequence and structure alignments and instead focus on fine-grained descriptors to understand the microbiome better.
CURRENT OPINION IN MICROBIOLOGY
(2023)
Review
Biochemistry & Molecular Biology
Gaurav D. Diwan, Juan Carlos Gonzalez-Sanchez, Gordana Apic, Robert B. Russell
Summary: The necessity to interpret genetic variants in terms of pathology or biological mechanism is urgent, with many insights into protein function impacted by genetic changes obtainable from three-dimensional structures. The development of precise methods, like Alphafold2, to predict structures from amino acid sequences may greatly benefit those seeking to understand genetic changes. This paper examines the current state of protein structures known for human and other proteomes, as well as the potential impact of Alphafold2 on variant interpretation efforts, suggesting that the available structural data for the human proteome may have a smaller impact on interpretation than anticipated. Additional efforts in structure prediction are also discussed for aiding the understanding of genetic variants.
JOURNAL OF MOLECULAR BIOLOGY
(2021)
Article
Biochemistry & Molecular Biology
Sen Liu, Kan Wu, Cheng Chen
Summary: The computational models from AlphaFold2 and RoseTTAFold can provide information about protein foldability, as indicated by the correlation between RMSD values and protein foldability. This correlation is independent of secondary structures and protein functions.
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL
(2022)
Article
Biochemistry & Molecular Biology
Zarifa Osmanli, Theo Falgarone, Turkan Samadova, Gudrun Aldrian, Jeremy Leclercq, Ilham Shahmuradov, Andrey Kajava
Summary: Alternative splicing is an important mechanism for generating protein diversity in cells. However, there is still a lack of structural data on alternative protein isoforms, as experimental studies typically focus on canonical proteins. In recent years, advances in bioinformatics tools and the development of the AlphaFold program have allowed for the modeling of high-confidence structures of isoforms. In this study, in silico analysis of 58 eukaryotic proteomes was performed, revealing differences in signal peptides, transmembrane regions, and tandem repeat regions between isoforms and canonical counterparts, potentially impacting protein function and cellular localization.
Review
Biochemical Research Methods
Luciano A. Abriata, Matteo Dal Peraro
Summary: Coupling residue coevolution estimations with machine learning methods is transforming protein structure prediction, particularly for proteins without clear homologous templates like in the recent CASP competition. However, making these advances accessible to non-experts and ensuring correct interpretation of predicted models still requires further development of web resources and tools.
BRIEFINGS IN BIOINFORMATICS
(2021)
Review
Biotechnology & Applied Microbiology
Yoshiaki Maeda, Tsuyoshi Tanaka
Summary: This review article discusses the potential functions and structures of diatom LOXs, which are enzymes involved in the production of oxygenated fatty acids. Although the structures of the diatom LOXs have not been determined, computational tools based on deep learning technology were used to predict their structures and study their functions. It was found that the diatom LOXs have wide substrate-binding pockets. However, further research is needed to fully understand the enzymology of these LOXs.
MARINE BIOTECHNOLOGY
(2022)
Article
Biochemistry & Molecular Biology
Andriy Kryshtafovych, Torsten Schwede, Maya Topf, Krzysztof Fidelis, John Moult
Summary: CASP is a community experiment aimed at advancing methods for computing three-dimensional protein structure, including rigorous blind testing and evaluation by independent assessors. In the recent CASP14 experiment, deep-learning methods from one research group consistently delivered computed structures rivaling the corresponding experimental ones in accuracy. These results represent a solution to the classical protein-folding problem, at least for single proteins.
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
(2021)
Article
Genetics & Heredity
Hilal Keskin Karakoyun, Sirin K. Yuksel, Ilayda Amanoglu, Lara Naserikhojasteh, Ahmet Yesilyurt, Cengiz Yakicier, Emel Timucin, Cemaliye. B. Akyerli
Summary: This study provides the first structural analysis of 26 hereditary cancer genes, showing that the thermodynamic stability predicted from AlphaFold AF2 structures and the confidence score of AF2 can effectively predict the pathogenicity of variants.
FRONTIERS IN GENETICS
(2023)
Article
Biochemistry & Molecular Biology
Matthew Thakur, Alex Bateman, Cath Brooksbank, Mallory Freeberg, Melissa Harrison, Matthew Hartley, Thomas Keane, Gerard Kleywegt, Andrew Leach, Mariia Levchenko, Sarah Morgan, Ellen M. McDonagh, Sandra Orchard, Irene Papatheodorou, Sameer Velankar, Juan Antonio Vizcaino, Rick Witham, Barbara Zdrazil, Johanna McEntyre
Summary: EMBL-EBI is one of the world's leading sources of public biomolecular data, offering sustainable, high-quality data that can serve as training sets for deep learning and artificial intelligence applications. The open availability of their extensive curated databases makes them ideal for research in the life sciences.
NUCLEIC ACIDS RESEARCH
(2023)
Article
Biochemistry & Molecular Biology
Mihaly Varadi, Nicola Bordin, Christine Orengo, Sameer Velankar
Summary: The function of proteins can be inferred from their three-dimensional structures. The advent of deep learning-based protein structure prediction tools in the early 2020s has had a significant impact on the field of life sciences. These tools offer new opportunities and challenges to the scientific community, and there are potential directions for the future of computational protein structure prediction.
CURRENT OPINION IN STRUCTURAL BIOLOGY
(2023)
Article
Biochemical Research Methods
Jon Agirre, Mihaela Atanasova, Haroldas Bagdonas, Charles B. Ballard, Arnaud Basle, James Beilsten-Edmands, Rafael J. Borges, David G. Brown, J. Javier Burgos-Marmol, John M. Berrisford, Paul S. Bond, Iracema Caballero, Lucrezia Catapano, Grzegorz Chojnowski, Atlanta G. Cook, Kevin D. Cowtan, Tristan I. Croll, Judit E. Debreczeni, Nicholas E. Devenish, Eleanor J. Dodson, Tarik R. Drevon, Paul Emsley, Gwyndaf Evans, Phil R. Evans, Maria Fando, James Foadi, Luis Fuentes-Montero, Elspeth F. Garman, Markus Gerstel, Richard J. Gildea, Kaushik Hatti, Maarten L. Hekkelman, Philipp Heuser, Soon Wen Hoh, Michael A. Hough, Huw T. Jenkins, Elisabet Jimenez, Robbie P. Joosten, Ronan M. Keegan, Nicholas Keep, Eugene B. Krissinel, Petr Kolenko, Oleg Kovalevskiy, Victor S. Lamzin, David M. Lawson, Andrey A. Lebedev, Andrew G. W. Leslie, Bernhard Lohkamp, Fei Long, Martin Maly, Airlie J. McCoy, Stuart J. McNicholas, Ana Medina, Claudia Millan, James W. Murray, Garib N. Murshudov, Robert A. Nicholls, Martin E. M. Noble, Robert Oeffner, Navraj S. Pannu, James M. Parkhurst, Nicholas Pearce, Joana Pereira, Anastassis Perrakis, Harold R. Powell, Randy J. Read, Daniel J. Rigden, William Rochira, Massimo Sammito, Filomeno Sanchez Rodriguez, George M. Sheldrick, Kathryn L. Shelley, Felix Simkovic, Adam J. Simpkin, Pavol Skubak, Egor Sobolev, Roberto A. Steiner, Kyle Stevenson, Ivo Tews, Jens M. H. Thomas, Andrea Thorn, Josep Trivino Valls, Ville Uski, Isabel Uson, Alexei Vagin, Sameer Velankar, Melanie Vollmar, Helen Walden, David Waterman, Keith S. Wilson, Martyn D. Winn, Graeme Winter, Marcin Wojdyr, Keitaro Yamashita
Summary: The Collaborative Computational Project No. 4 (CCP4) is an international collective led by the UK, dedicated to the development, testing, distribution, and promotion of software for macromolecular crystallography. The CCP4 suite is a multiplatform collection of programs, unified by familiar execution routines, common libraries, and graphical interfaces. This article serves as a general literature citation for the use of the CCP4 software suite, providing an overview of its recent changes, new features, and future developments, while also highlighting the individual programs within the suite and providing up-to-date references for crystallographers worldwide.
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY
(2023)
Article
Multidisciplinary Sciences
Preeti Choudhary, Stephen Anyango, John Berrisford, James Tolchard, Mihaly Varadi, Sameer Velankar
Summary: More than 61,000 proteins have correspondence between their amino acid sequence (UniProtKB) and 3D structures (PDB) through the SIFTS resource. SIFTS incorporates residue-level annotations from various biological resources and is maintained separately from the structure data in the PDB archive.
Article
Biology
Nicola Bordin, Ian Sillitoe, Vamsi Nallapareddy, Clemens Rauer, Su Datt Lam, Vaishali P. Waman, Neeladri Sen, Michael Heinzinger, Maria Littmann, Stephanie Kim, Sameer Velankar, Martin Steinegger, Burkhard Rost, Christine Orengo
Summary: Deep-learning methods, such as DeepMind's AlphaFold2 (AF2), have significantly improved protein structure prediction. In this study, we used a new classification protocol (CATH-Assign) that utilizes novel DL methods for structural comparison and classification to analyze confident AF2 models from 21 model organisms. Our analysis revealed that 92% of the models could be assigned to existing superfamilies in the CATH domain superfamily classification, with the remaining models clustering into 2367 putative novel superfamilies. Detailed manual analysis of a subset of these models confirmed 25 novel superfamilies and identified remote homologies and unusual features. The expansion of CATH by AF2 domains is valuable for understanding structure-function relationships.
COMMUNICATIONS BIOLOGY
(2023)
Article
Biochemistry & Molecular Biology
Brinda Vallat, Gerardo Tauriello, Stefan Bienert, Juergen Haas, Benjamin M. Webb, Augustin Zidek, Wei Zheng, Ezra Peisach, Dennis W. Piehl, Ivan Anischanka, Ian Sillitoe, James Tolchard, Mihaly Varadi, David Baker, Christine Orengo, Yang Zhang, Jeffrey C. Hoch, Genji Kurisu, Ardan Patwardhan, Sameer Velankar, Stephen K. Burley, Andrej Sali, Torsten Schwede, Helen M. Berman, John D. Westbrook
Summary: ModelCIF is a data framework developed by computational structural biologists to deliver FAIR data of macromolecular structures worldwide, describing their attributes and metadata and providing a representation for deposition, archiving, and dissemination. It is an extension of the PDBx/mmCIF framework, the standard for experimentally-determined 3D structures, and is managed by the wwPDB partnership.
JOURNAL OF MOLECULAR BIOLOGY
(2023)
Article
Biochemistry & Molecular Biology
Mihaly Varadi, Maxim Tsenkov, Sameer Velankar
Summary: The rapid evolution of protein structure prediction tools has made protein structural data more accessible. However, it is important to note that these predicted models are not validated. Challenges still exist in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations and open data sharing are crucial in overcoming these obstacles.
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
(2023)
Letter
Biochemical Research Methods
Wenqing Xu, Sameer Velankar, Ardan Patwardhan, Jeffrey C. Hoch, Stephen K. Burley, Genji Kurisu
Summary: The Protein Data Bank (PDB) is a global archive of atomic-level, three-dimensional structures of biological macromolecules. Recently, there has been an increase in new structure depositions from Asia. In 2022, Protein Data Bank China (PDBc) joined the Worldwide Protein Data Bank (wwPDB) as an Associate Member. This letter discusses the history of wwPDB, the mechanisms for adding new data centers, and the processes for incorporating PDBc into the partnership.
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY
(2023)
Article
Chemistry, Multidisciplinary
Ibrahim Roshan Kunnakkattu, Preeti Choudhary, Lukas Pravda, Nurul Nadzirin, Oliver S. Smart, Qi Yuan, Stephen Anyango, Sreenath Nair, Mihaly Varadi, Sameer Velankar
Summary: PDBe CCDUtils is a versatile toolkit for processing and analyzing small molecules from the PDB. It provides convenient methods for computation and retrieval, catering to the needs of researchers in fields such as cheminformatics and structural biology.
JOURNAL OF CHEMINFORMATICS
(2023)
Article
Biochemistry & Molecular Biology
Jack Turner, Sanja Abbott, Neli Fonseca, Lucas Carrijo, Amudha Kumari Duraisamy, Osman Salih, Zhe Wang, Gerard J. Kleywegt, Kyle L. Morris, Ardan Patwardhan, Stephen K. Burley, Gregg Crichlow, Zukang Feng, Justin W. Flatt, Sutapa Ghosh, Brian P. Hudson, Catherine L. Lawson, Yuhe Liang, Ezra Peisach, Irina Persikova, Monica Sekharan, Chenghua Shao, Jasmine Young, Sameer Velankar, David Armstrong, Marcus Bage, Wesley Morellato Bueno, Genevieve Evans, Romana Gaborova, Sudakshina Ganguly, Deepti Gupta, Deborah Harrus, Ahsan Tanweer, Manju Bansal, Vetriselvi Rangannan, Genji Kurisu, Hasumi Cho, Yasuyo Ikegawa, Yumiko Kengaku, Ju Yaen Kim, Satomi Niwa, Junko Sato, Ayako Takuwa, Jian Yu, Jeffrey C. Hoch, Kumaran Baskaran, Wenqing Xu, Weizhe Zhang, Xiaodan Ma
Summary: This article provides an overview of the Electron Microscopy Data Bank (EMDB), its significance as a global public archive, and its current holdings, recent updates, and future plans.
NUCLEIC ACIDS RESEARCH
(2023)
Article
Biochemistry & Molecular Biology
Mihaly Varadi, Damian Bertoni, Paulyna Magana, Urmila Paramval, Ivanna Pidruchna, Malarvizhi Radhakrishnan, Maxim Tsenkov, Sreenath Nair, Milot Mirdita, Jingi Yeo, Oleg Kovalevskiy, Kathryn Tunyasuvunakool, Agata Laydon, Augustin Zidek, Hamish Tomlinson, Dhavanthi Hariharan, Josh Abrahamson, Tim Green, John Jumper, Ewan Birney, Martin Steinegger, Demis Hassabis, Sameer Velankar
Summary: The AlphaFold Database Protein Structure Database (AlphaFold DB) has expanded significantly since its initial release in 2021, now containing over 214 million predicted protein structures. Powered by the AlphaFold2 artificial intelligence (AI) system, the database has integrated its predictions into primary data resources such as PDB, UniProt, Ensembl, InterPro, and MobiDB. This manuscript details the enhancements made to data archiving, including the addition of model organisms, global health proteomes, Swiss-Prot integration, and curated protein datasets. The access mechanisms of AlphaFold DB, from direct file access to advanced queries using Google Cloud Public Datasets, are also discussed, along with improvements and added services since its release, such as enhancements to the Predicted Aligned Error viewer and the 3D viewer customization options.
NUCLEIC ACIDS RESEARCH
(2023)