4.7 Editorial Material

Curators of the world unite: the International Society of Biocuration

Ask authors/readers for more resources

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Biotechnology & Applied Microbiology

Using deep learning to annotate the protein universe

Maxwell L. Bileschi, David Belanger, Drew Bryant, Theo Sanderson, Brandon Carter, D. Sculley, Alex Bateman, Mark A. DePristo, Lucy J. Colwell

Summary: This article describes a method that uses deep learning models to predict functional annotations for unaligned protein amino acid sequences. The models are trained on rigorous benchmark assessments and can accurately predict the function of sequences across different protein families. The results show that deep learning models can significantly improve remote homology detection and expand the coverage of existing annotation tools. These models are expected to be a core component of future protein annotation tools.

NATURE BIOTECHNOLOGY (2022)

Article Biochemistry & Molecular Biology

The European Bioinformatics Institute (EMBL-EBI) in 2021

Gaia Cantelli, Alex Bateman, Cath Brooksbank, Anton Petrov, Rahuman S. Malik-Sheriff, Michele Ide-Smith, Henning Hermjakob, Paul Flicek, Rolf Apweiler, Ewan Birney, Johanna McEntyre

Summary: The European Bioinformatics Institute (EMBL-EBI) offers a wide range of freely available molecular data resources, including new resources like the PGS Catalog and AlphaFold DB. They have also been involved in developing community-driven data standards, such as the Recommended Metadata for Biological Images and the BioModels Reproducibility Scorecard. Training is a core mission of EMBL-EBI, with improvements to their online training offerings being part of this year's update.

NUCLEIC ACIDS RESEARCH (2022)

Article Microbiology

Large-Scale Discovery of Microbial Fibrillar Adhesins and Identification of Novel Members of Adhesive Domain Families

Vivian Monzon, Alex Bateman

Summary: In this study, a machine learning approach was developed to identify fibrillar adhesins and novel members of adhesive domain families. The method successfully predicted over 6,500 confident fibrillar adhesins and identified 15 clusters with structural similarity to known adhesive domains. This research contributes to our understanding of bacterium-host interactions and bacterial pathogenesis.

JOURNAL OF BACTERIOLOGY (2022)

Article Multidisciplinary Sciences

Bacterial retrons encode phage-defending tripartite toxin-antitoxin systems

Jacob Bobonis, Karin Mitosch, Andre Mateus, Nicolai Karcher, George Kritikos, Joel Selkrig, Matylda Zietek, Vivian Monzon, Birgit Pfalz, Sarela Garcia-Santamarina, Marco Galardini, Anna Sueki, Callie Kobayashi, Frank Stein, Alex Bateman, Georg Zeller, Mikhail M. Savitski, Johanna R. Elfenbein, Helene L. Andrews-Polymenis, Athanasios Typas

Summary: This study shows that Retron-Sen2 of Salmonella enterica serovar Typhimurium encodes an accessory toxin protein, RcaT, which is neutralized by the reverse transcriptase-msDNA antitoxin complex and becomes active upon perturbation of msDNA biosynthesis. The highly prevalent RcaT-containing retron family constitutes a new type of tripartite DNA-containing toxin-antitoxin system. The research also demonstrates that retron toxin-antitoxin systems act as abortive infection anti-phage defence systems.

NATURE (2022)

Article Biochemical Research Methods

DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets

Elena Tea Russoid, Federico Barone, Alex Bateman, Stefano Cozzini, Marco Punta, Alessandro Laio

Summary: Automatic clustering using Density Peak Clustering algorithm was applied to the UniRef50 protein database, resulting in the identification of thousands of protein clusters. The classification results were compared with existing resources and revealed some unannotated protein families.

PLOS COMPUTATIONAL BIOLOGY (2022)

Article Biochemistry & Molecular Biology

A structural biology community assessment of AlphaFold2 applications

Mehmet Akdel, Douglas E. Pires, Eduard Porta Pardo, Jurgen Janes, Arthur O. Zalevsky, Balint Meszaros, Patrick Bryant, Lydia L. Good, Roman A. Laskowski, Gabriele Pozzati, Aditi Shenoy, Wensi Zhu, Petras Kundrotas, Victoria Ruiz Serra, Carlos H. M. Rodrigues, Alistair S. Dunham, David Burke, Neera Borkakoti, Sameer Velankar, Adam Frost, Jerome Basquin, Kresten Lindorff-Larsen, Alex Bateman, Andrey Kajava, Alfonso Valencia, Sergey Ovchinnikov, Janani Durairaj, David B. Ascher, Janet M. Thornton, Norman E. Davey, Amelie Stein, Arne Elofsson, Tristan Croll, Pedro Beltrao

Summary: This study evaluates the performance of AlphaFold2 in structural biology applications and finds that it performs well and can partially replace experimentally determined structures, which is of great significance for life science research.

NATURE STRUCTURAL & MOLECULAR BIOLOGY (2022)

Article Biochemistry & Molecular Biology

InterPro in 2022

Typhaine Paysan-Lafosse, Matthias Blum, Sara Chuguransky, Tiago Grego, Beatriz Lazaro Pinto, Gustavo A. Salazar, Maxwell L. Bileschi, Peer Bork, Alan Bridge, Lucy Colwell, Julian Gough, Daniel H. Haft, Ivica Letunic, Aron Marchler-Bauer, Huaiyu Mi, Darren A. Natale, Christine A. Orengo, Arun P. Pandurangan, Catherine Rivoire, Christian J. A. Sigrist, Ian Sillitoe, Narmada Thanki, Paul D. Thomas, Silvio C. E. Tosatto, Cathy H. Wu, Alex Bateman

Summary: The InterPro database has been updated with new data content and website features, providing a more user-friendly access to protein sequence classification and functional domain identification. It has also integrated features from the retiring Pfam website and developed a card game to engage the non-scientific community. Furthermore, the database explores the benefits and challenges of using artificial intelligence for protein structure prediction.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022

Matthew Thakur, Alex Bateman, Cath Brooksbank, Mallory Freeberg, Melissa Harrison, Matthew Hartley, Thomas Keane, Gerard Kleywegt, Andrew Leach, Mariia Levchenko, Sarah Morgan, Ellen M. McDonagh, Sandra Orchard, Irene Papatheodorou, Sameer Velankar, Juan Antonio Vizcaino, Rick Witham, Barbara Zdrazil, Johanna McEntyre

Summary: EMBL-EBI is one of the world's leading sources of public biomolecular data, offering sustainable, high-quality data that can serve as training sets for deep learning and artificial intelligence applications. The open availability of their extensive curated databases makes them ideal for research in the life sciences.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

UniProt: the Universal Protein Knowledgebase in 2023

Alex Bateman, Maria-Jesus Martin, Sandra Orchard, Michele Magrane, Shadab Ahmad, Emanuele Alpi, Emily H. Bowler-Barnett, Ramona Britto, Austra Cukura, Paul Denny, Tunca Dogan, ThankGod Ebenezer, Jun Fan, Penelope Garmiri, Leonardo Jose da Costa Gonzales, Emma Hatton-Ellis, Abdulrahman Hussein, Alexandr Ignatchenko, Giuseppe Insana, Rizwan Ishtiaq, Vishal Joshi, Dushyanth Jyothi, Swaathi Kandasaamy, Antonia Lock, Aurelien Luciani, Marija Lugaric, Jie Luo, Yvonne Lussi, Alistair MacDougall, Fabio Madeira, Mahdi Mahmoudy, Alok Mishra, Katie Moulang, Andrew Nightingale, Sangya Pundir, Guoying Qi, Shriya Raj, Pedro Raposo, Daniel L. Rice, Rabie Saidi, Rafael Santos, Elena Speretta, James Stephenson, Prabhat Totoo, Edward Turner, Nidhi Tyagi, Preethi Vasudev, Kate Warner, Xavier Watkins, Hermann Zellner, Alan J. Bridge, Lucila Aimo, Ghis-laine Argoud-Puy, Andrea H. Auchincloss, Kristian B. Axelsen, Parit Bansal, Delphine Baratin, Teresa M. Batista Neto, Marie-Claude Blatter, Jerven T. Bolleman, Emmanuel Boutet, Lionel Breuza, Blanca Cabrera Gil, Cristina Casals-Casas, Kamal Chikh Echioukh, Elisabeth Coudert, Beatrice Cuche, Edouard de Castro, Anne Estreicher, Maria L. Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Pascale Gaudet, Sebastien Gehant, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz, Chantal Hulo, Nevila Hyka-Nouspikel, Florence Jungo, Arnaud Kerhornou, Philippe Le Mercier, Damien Lieberherr, Patrick Masson, Anne Morgat, Venkatesh Muthukrishnan, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Lucille Pourcel, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Christian J. A. Sigrist, Karin Sonesson, Cecilia N. Arighi, Leslie Armin-ski, Chuming Chen, Yongxing Chen, Hongzhan Huang, Kati Laiho, Peter McGarvey, Darren A. Natale, Karen Ross, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Jian Zhang, Hema Bye-A-Jee, Rossana Zaru, Shyamala Sundaram, Cathy H. Wu

Summary: The UniProt Knowledgebase aims to provide comprehensive, high-quality, and freely accessible protein sequences annotated with functional information. The database has expanded its data processing pipeline and website to accommodate the increasing information content, with over 227 million sequences and plans to include a reference proteome for each taxonomic group. Detailed annotations are extracted from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations from automated systems. The new website, https://www.uniprot.org/, offers enhanced user experience and easy access to data, including AlphaFold structures and improved protein subcellular localization visualizations.

NUCLEIC ACIDS RESEARCH (2023)

Article Biology

EROS is a selective chaperone regulating the phagocyte NADPH oxidase and purinergic signalling

Lyra O. Randzavola, Paige M. Mortimer, Emma Garside, Elizabeth R. Dufficy, Andrea Schejtman, Georgia Roumelioti, Lu Yu, Mercedes Pardo, Kerstin Spirohn, Charlotte Tolley, Cordelia Brandt, Katherine Harcourt, Esme Nichols, Mike Nahorski, Geoff Woods, James C. Williamson, Shreehari Suresh, John M. Sowerby, Misaki Matsumoto, Celio X. C. Santos, Cher Shen Kiar, Subhankar Mukhopadhyay, William M. Rae, Gordon J. Dougan, John Grainger, Paul J. Lehner, Michael A. Calderwood, Jyoti Choudhary, Simon Clare, Anneliese Speak, Giorgia Santilli, Alex Bateman, Kenneth G. C. Smith, Francesca Magnani, David C. Thomas

Summary: EROS protein plays a crucial role in immune response by regulating the maturation of gp91phox and the function of purine receptors P2X7 and P2X1. Its deficiency results in severe immunodeficiency and chronic granulomatous disease, while also enhancing resistance to influenza infection.

ELIFE (2022)

Article Multidisciplinary Sciences

Uncovering new families and folds in the natural protein universe

Janani Durairaj, Andrew M. Waterhouse, Toomas Mets, Tetiana Brodiazhenko, Minhal Abdullah, Gabriel Studer, Gerardo Tauriello, Mehmet Akdel, Antonina Andreeva, Alex Bateman, Tanel Tenson, Vasili Hauryliuk, Torsten Schwede, Joana Pereira

Summary: The AlphaFold database provides millions of predicted protein structures, covering almost all known proteins. By utilizing deep learning techniques, these structures can be accurately predicted, shedding light on the functions and roles of these proteins in biology. This study further explores the discovery of new protein families through these predicted structures.

NATURE (2023)

Review Biochemistry & Molecular Biology

When will RNA get its AlphaFold moment?

Bohdan Schneider, Blake Alexander Sweeney, Alex Bateman, Jiri Cerny, Tomasz Zok, Marta Szachniuk

Summary: Due to limited data and quality issues, it is challenging to predict the 3D structure of RNA using deep learning methods like AlphaFold in the short term. However, by addressing data quality and volume issues, utilizing more data, and developing new machine learning methods, an accurate RNA structure prediction method can be created.

NUCLEIC ACIDS RESEARCH (2023)

Article Multidisciplinary Sciences

Expanding the repertoire of human tandem repeat RNA-binding proteins

Agustin Ormazabal, Matias Sebastian Carletti, Tadeo Enrique Saldano, Martin Gonzalez Buitron, Julia Marchetti, Nicolas Palopoli, Alex Bateman

Summary: This study presents a large-scale analysis of human proteins, identifying tandem repeat proteins that bind RNA molecules. The combination of sequence and structural methods was found to be more effective in discovering these proteins than using either method alone. Differences were observed in the characteristics of repeat regions predicted by sequence-based or structure-based methods.

PLOS ONE (2023)

Article Biochemical Research Methods

Annotation of biologically relevant ligands in UniProtKB using ChEBI

Elisabeth Coudert, Sebastien Gehant, Edouard de Castro, Monica Pozzato, Delphine Baratin, Teresa Neto, Christian J. A. Sigrist, Nicole Redaschi, Alan Bridge

Summary: This study aims to provide high-quality annotations of binding sites for biologically relevant ligands in UniProtKB using the ChEBI chemical ontology. The researchers developed improved search and query facilities for these binding sites and used stable unique identifiers from ChEBI as reference vocabulary for the annotations. The annotations are freely available for querying and downloading through the UniProt website, REST API, SPARQL endpoint, and FTP site.

BIOINFORMATICS (2023)

Article Mathematical & Computational Biology

A roadmap for the functional annotation of protein families: a community perspective

Valerie de Crecy-Lagard, Rocio Amorin de Hegedus, Cecilia Arighi, Jill Babor, Alex Bateman, Ian Blaby, Crysten Blaby-Haas, Alan J. Bridge, Stephen K. Burley, Stacey Cleveland, Lucy J. Colwell, Ana Conesa, Christian Dallago, Antoine Danchin, Anita de Waard, Adam Deutschbauer, Raquel Dias, Yousong Ding, Gang Fang, Iddo Friedberg, John Gerlt, Joshua Goldford, Mark Gorelik, Benjamin M. Gyori, Christopher Henry, Geoffrey Hutinet, Marshall Jaroch, Peter D. Karp, Liudmyla Kondratova, Zhiyong Lu, Aron Marchler-Bauer, Maria-Jesus Martin, Claire McWhite, Gaurav D. Moghe, Paul Monaghan, Anne Morgat, Christopher J. Mungall, Darren A. Natale, William C. Nelson, Sean O'Donoghue, Christine Orengo, Katherine H. O'Toole, Predrag Radivojac, Colbie Reed, Richard J. Roberts, Dmitri Rodionov, Irina A. Rodionova, Jeffrey D. Rudolf, Lana Saleh, Gloria Sheynkman, Francoise Thibaud-Nissen, Paul D. Thomas, Peter Uetz, David Vallenet, Erica Watson Carter, Peter R. Weigele, Valerie Wood, Elisha M. Wood-Charlson, Jin Xu

Summary: In the past 25 years, biology has entered the genomic era and has become a science of 'big data'. However, accurate functional annotations of the proteins encoded by sequenced genomes are lacking, with only about half of the predicted proteins having accurate annotations. This gap in knowledge hampers the progress of biological research. To address this issue, a brainstorming meeting funded by the National Science Foundation was held in February 2022, bringing together data scientists, biocurators, computational biologists, and experimentalists to comprehensively assess the current state of functional annotations of protein families and propose solutions to move forward.

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION (2022)

No Data Available