4.7 Article

Published and Perished? The Influence of the Searched Protein Database on the Long-Term Storage of Proteomics Data

期刊

MOLECULAR & CELLULAR PROTEOMICS
卷 10, 期 9, 页码 -

出版社

AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC
DOI: 10.1074/mcp.M111.008490

关键词

-

资金

  1. Wellcome Trust [WT085949MA]
  2. EU [226073, 202272, 260558]
  3. Austrian Science Fund, FWF [L 670-B13]
  4. Austrian Science Fund (FWF) [L 670] Funding Source: researchfish

向作者/读者索取更多资源

In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time. To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by November 2010. To map the submitted protein identifier to a currently active entry, two distinct approaches were used. The first approach used the Protein Identifier Cross Referencing (PICR) service at the EBI, which maps protein identifiers based on 100% sequence identity. The second one (called logical mapping algorithm) accessed the source databases and retrieved the current status of the reported identifier. Our analysis showed the differences between the main protein databases (International Protein Index (IPI), Uni-Prot Knowledgebase (UniProtKB), National Center for Biotechnological Information nr database (NCBI nr), and Ensembl) in respect to identifier stability. For example, whereas 20% of submitted IPI entries were deleted after two years, virtually all UniProtKB entries remained either active or replaced. Furthermore, the two mapping algorithms produced markedly different results. For example, the PICR service reported 10% more IPI entries deleted compared with the logical mapping algorithm. We found several cases where experiments contained more than 10% deleted identifiers already at the time of publication. We also assessed the proportion of peptide identifications in these data sets that still fitted the originally identified protein sequences. Finally, we performed the same overall analysis on all records from IPI, Ensembl, and UniProtKB: two releases per year were used, from 2005. This analysis showed for the first time the true effect of changing protein identifiers on proteomics data. Based on these findings, UniProtKB seems the best database for applications that rely on the long-term storage of proteomics data. Molecular & Cellular Proteomics 10: 10.1074/mcp.M111.008490, 1-11, 2011.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Biochemical Research Methods

Is DIA proteomics data FAIR? Current data sharing practices, available bioinformatics infrastructure and recommendations for the future

Andrew R. Jones, Eric W. Deutsch, Juan Antonio Vizcaino

Summary: DIA proteomics techniques have made significant progress in recent years, but there is still room for improvement in terms of FAIR data principles. To enhance the current situation for DIA data, recommendations include developing an open data standard for spectral libraries, mandating the availability of spectral libraries in ProteomeXchange resources, improving support for DIA data in data standards, and enhancing support for DIA datasets in ProteomeXchange resources.

PROTEOMICS (2023)

Article Biochemical Research Methods

Integrated View of Baseline Protein Expression in Human Tissues

Ananth Prakash, David Garcia-Seisdedos, Shengbo Wang, Deepti Jaiswal Kundu, Andrew Collins, Nancy George, Pablo Moreno, Irene Papatheodorou, Andrew R. Jones, Juan Antonio Vizcaino

Summary: The availability of proteomics datasets, especially in the PRIDE database, has significantly increased in recent years, providing an opportunity for combined analyses of datasets to obtain organism-wide protein abundance data. In this study, we reanalyzed 24 public proteomics datasets to assess baseline protein abundance in 31 organs of healthy individuals. We compared protein abundances between organs, studied protein distribution, and performed gene ontology and pathway-enrichment analyses. The results are integrated into the Expression Atlas resource to enhance accessibility for life scientists.

JOURNAL OF PROTEOME RESEARCH (2023)

Article Biochemical Research Methods

Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work

Eric W. Deutsch, Juan Antonio Vizcaino, Andrew R. Jones, Pierre-Alain Binz, Henry Lam, Joshua Klein, Wout Bittremieux, Yasset Perez-Riverol, David L. Tabb, Mathias Walzer, Sylvie Ricard-Blum, Henning Hermjakob, Steffen Neumann, Tytus D. Mak, Shin Kawano, Luis Mendoza, Tim Van Den Bossche, Ralf Gabriels, Nuno Bandeira, Jeremy Carver, Benjamin Pullman, Zhi Sun, Nils Hoffmann, Jim Shofstahl, Yunping Zhu, Luana Licata, Federica Quaglia, Silvio C. E. Tosatto, Sandra E. Orchard

Summary: The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been developing guidelines and standards for the proteomics community for 20 years. The organization describes its operation, the current state of existing standards, and the proposals currently being developed. They emphasize the importance of community participation and collaboration with other organizations to promote data sharing and accelerate progress in proteomics.

JOURNAL OF PROTEOME RESEARCH (2023)

Article Biochemical Research Methods

ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics

Tobias G. Rehfeldt, Ralf Gabriels, Robbin Bouwmeester, Siegfried Gessulat, Benjamin A. Neely, Magnus Palmblad, Yasset Perez-Riverol, Tobias Schmidt, Juan Antonio Vizcaino, Eric W. Deutsch

Summary: Data set acquisition and curation are challenging in machine learning, particularly for proteomics-based LC-MS data sets due to data reduction. ProteomicsML is introduced as an online resource for accessing proteomics-based data sets and tutorials. It simplifies data access and provides tutorials for interacting with advanced algorithms. ProteomicsML enables comparison of machine learning algorithms and offers introductory material for newcomers in the field. The platform is freely available at https://www.proteomicsml.org/, and contributions are welcome at https://github.com/ProteomicsML/ProteomicsML.

JOURNAL OF PROTEOME RESEARCH (2023)

Article Biochemistry & Molecular Biology

EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022

Matthew Thakur, Alex Bateman, Cath Brooksbank, Mallory Freeberg, Melissa Harrison, Matthew Hartley, Thomas Keane, Gerard Kleywegt, Andrew Leach, Mariia Levchenko, Sarah Morgan, Ellen M. McDonagh, Sandra Orchard, Irene Papatheodorou, Sameer Velankar, Juan Antonio Vizcaino, Rick Witham, Barbara Zdrazil, Johanna McEntyre

Summary: EMBL-EBI is one of the world's leading sources of public biomolecular data, offering sustainable, high-quality data that can serve as training sets for deep learning and artificial intelligence applications. The open availability of their extensive curated databases makes them ideal for research in the life sciences.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

The ProteomeXchange consortium at 10 years: 2023 update

Eric W. Deutsch, Nuno Bandeira, Yasset Perez-Riverol, Vagisha Sharma, Jeremy J. Carver, Luis Mendoza, Deepti J. Kundu, Shengbo Wang, Chakradhar Bandla, Selvakumar Kamatchinathan, Suresh Hewapathirana, Benjamin S. Pullman, Julie Wertz, Zhi Sun, Shin Kawano, Shujiro Okuda, Yu Watanabe, Brendan MacLean, Michael J. MacCoss, Yunping Zhu, Yasushi Ishihama, Juan Antonio Vizcaino

Summary: This article describes the recent developments in the ProteomeXchange (PX) consortium, which aims to standardize data submission and dissemination of MS proteomics data. The article highlights the increase in the number of datasets submitted to PX resources and the growing data re-use activities.

NUCLEIC ACIDS RESEARCH (2023)

Review Biochemistry & Molecular Biology

Exploring the Potential of Metal-Based Candidate Drugs as Modulators of the Cytoskeleton

Yasmin Borutzki, Lukas Skos, Christopher Gerner, Samuel M. Meier-Menches

Summary: In recent years, metal-based candidate drugs have shown promise as modulators of cytoskeletal and cytoskeleton-associated proteins. Actin, vimentin, and plectin have been identified as targets of ruthenium(II) and platinum(II)-based modulators. However, there is limited structural information available on molecular interactions. This article compiles scattered reports on metal-based candidate molecules influencing the cytoskeleton and its associated proteins, exploring their potential in cancer-related processes.

CHEMBIOCHEM (2023)

Article Biochemical Research Methods

Toward an Integrated Machine Learning Model of a Proteomics Experiment

Benjamin A. Neely, Viktoria Dorfer, Lennart Martens, Isabell Bludau, Robbin Bouwmeester, Sven Degroeve, Eric W. Deutsch, Siegfried Gessulat, Lukas Kaell, Pawel Palczynski, Samuel H. Payne, Tobias Greisager Rehfeldt, Tobias Schmidt, Veit Schwaemmle, Julian Uszkoreit, Juan Antonio Vizcaino, Mathias Wilhelm, Magnus Palmblad

Summary: In recent years, machine learning has made significant progress in modeling mass spectrometry data for proteomics analysis. A workshop was conducted to evaluate and explore machine learning applications in multidimensional mass spectrometry-based proteomics analysis. The workshop helped identify knowledge gaps, define needs, and discuss the possibilities, challenges, and future opportunities. The summary of the discussions conveys excitement about the potential of machine learning in proteomics and aims to inspire future research.

JOURNAL OF PROTEOME RESEARCH (2023)

Article Cell Biology

A Practical and Analytical Comparative Study of Gel-Based Top-Down and Gel-Free Bottom-Up Proteomics Including Unbiased Proteoform Detection

Huriye Ercan, Ulrike Resch, Felicia Hsu, Goran Mitulovic, Andrea Bileck, Christopher Gerner, Jae-Won Yang, Margarethe Geiger, Ingrid Miller, Maria Zellner

Summary: Proteomics is an essential analytical technique for studying biological systems using different proteins. The study compared the qualitative and quantitative performance of two commonly used proteomics techniques, label-free shotgun and 2D-DIGE, using six technical and three biological replicates of the human prostate carcinoma cell line DU145. The results showed that label-free shotgun quickly provides an annotated proteome but with reduced robustness compared to 2D-DIGE, which offers qualitative and quantitative information on proteoforms and post-translational modifications. However, the 2D-DIGE technique requires more time and manual work. Ultimately, this work highlights the different outputs and applications of these two techniques for biological research.
Article Biochemistry & Molecular Biology

Impact of Bariatric Surgery on the Stability of the Genetic Material, Oxidation, and Repair of DNA and Telomere Lengths

Franziska Ferk, Miroslav Misik, Benjamin Ernst, Gerhard Prager, Christoph Bichler, Doris Mejri, Christopher Gerner, Andrea Bileck, Michael Kundi, Sabine Langie, Klaus Holzmann, Siegfried Knasmueller

Summary: Obesity causes genetic instability, which is a key factor in the development of cancer and aging. This study investigated the effects of bariatric surgery on DNA repair, oxidative DNA damage, telomere lengths, antioxidant enzymes, and inflammation-related proteins. The results showed that after 6 months, bariatric surgery led to weight reduction, decreased DNA damage and oxidized DNA bases, lower levels of malondealdehyde, increased DNA repair and telomere lengths, and downregulation of inflammation-related proteins. These findings suggest that bariatric surgery can reduce DNA damage and inflammation, resulting in long-term health benefits.

ANTIOXIDANTS (2023)

Article Biochemical Research Methods

TopDownApp: An open and modular platform for analysis and visualisation of top-down proteomics data

Mathias Walzer, Kyowon Jeong, David L. Tabb, Juan Antonio Vizcaino

Summary: Although Top-down (TD) proteomics techniques are gaining popularity in intact protein and proteoform analysis, efforts are required to promote their adoption at different levels. Open science practices, including data sharing and open data analysis workflows, need to be improved and implemented.

PROTEOMICS (2023)

Article Cell Biology

Primary and hTERT-Transduced Mesothelioma-Associated Fibroblasts but Not Primary or hTERT-Transduced Mesothelial Cells Stimulate Growth of Human Mesothelioma Cells

Alexander Ries, Astrid Slany, Christine Pirker, Johanna C. C. Mader, Doris Mejri, Thomas Mohr, Karin Schelch, Daniela Flehberger, Nadine Maach, Muhammad Hashim, Mir Alireza Hoda, Balazs Dome, Georg Krupitza, Walter Berger, Christopher Gerner, Klaus Holzmann, Michael Grusch

Summary: In this study, novel hTERT-transduced mesothelial cell and Meso-CAF models were generated and characterized, and their impact on PM cell growth was investigated.
Article Biochemistry & Molecular Biology

Expression Atlas update: insights from sequencing data at both bulk and single cell level

Nancy George, Silvie Fexova, Alfonso Munoz Fuentes, Pedro Madrigal, Yalan Bi, Haider Iqbal, Upendra Kumbham, Nadja Francesca Nolte, Lingyun Zhao, Anil S. Thanki, Iris D. Yu, Jose C. Marugan Calles, Karoly Erdos, Liora Vilmovsky, Sandeep R. Kurri, Anna Vathrakokoili-Pournara, David Osumi-Sutherland, Ananth Prakash, Shengbo Wang, Marcela K. Tello-Ruiz, Sunita Kumari, Doreen Ware, Damien Goutte-Gattat, Yanhui Hu, Nick Brown, Norbert Perrimon, Juan Antonio Vizcaino, Tony Burdett, Sarah Teichmann, Alvis Brazma, Irene Papatheodorou

Summary: Expression Atlas and Single Cell Expression Atlas are knowledgebases for gene and protein expression and localisation, covering bulk and single cell levels of data respectively. Users can search genes or metadata across species and explore data through dimensionality reduction plots and heatmaps to understand the expression patterns.

NUCLEIC ACIDS RESEARCH (2023)

Article Mathematical & Computational Biology

The landscape of microRNA interaction annotation: analysis of three rare disorders as a case study

Panni Simona, Kalpana Panneerselvam, Pablo Porras, Margaret Duesbury, Livia Perfetto, Luana Licata, Henning Hermjakob, Sandra Orchard

Summary: In this paper, the authors present a method of annotating microRNA-mRNA interactions from the scientific literature. They focus on microRNAs that regulate genes associated with rare diseases and provide a detailed description of cell types and experimental conditions to enhance the information about the interactions. The authors also highlight the importance of mapping the binding sites of microRNAs on target genes' mRNA transcripts.

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION (2023)

暂无数据