4.5 Article

Value, but high costs in post-deposition data curation

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/database/bav126

Keywords

-

Funding

  1. European Molecular Biology Laboratory (EMBL)
  2. UK Biotechnology and Biological Sciences Research Council under Metagenomics Portal [BB/I02612X/1]
  3. Biotechnology and Biological Sciences Research Council [BB/I02612X/1] Funding Source: researchfish
  4. BBSRC [BB/I02612X/1] Funding Source: UKRI

Ask authors/readers for more resources

Discoverability of sequence data in primary data archives is proportional to the richness of contextual information associated with the data. Here, we describe an exercise in the improvement of contextual information surrounding sample records associated with metagenomics sequence reads available in the European Nucleotide Archive. We outline the annotation process and summarize findings of this effort aimed at increasing usability of publicly available environmental data. Furthermore, we emphasize the benefits of such an exercise and detail its costs. We conclude that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Microbiology

Microbial diversity through an oceanographic lens: refining the concept of ocean provinces through trophic-level analysis and productivity-specific length scales

Cora Hoerstmann, Pier Luigi Buttigieg, Uwe John, Eric J. Raes, Dieter Wolf-Gladrow, Astrid Bracher, Anya M. Waite

Summary: This study investigated microbial diversity and primary productivity in the Atlantic Ocean between 50 degrees south and north, revealing distinct diversity patterns among different provinces. Samplewise productivity-specific length scales were calculated to provide key context for further analysis, linking diversity patterns to oceanographic transport through primary production.

ENVIRONMENTAL MICROBIOLOGY (2022)

Article Biochemistry & Molecular Biology

The European Nucleotide Archive in 2021

Carla Cummins, Alisha Ahamed, Raheela Aslam, Josephine Burgin, Rajkumar Devraj, Ossama Edbali, Dipayan Gupta, Peter W. Harrison, Muhammad Haseeb, Sam Holt, Talal Ibrahim, Eugene Ivanov, Suran Jayathilaka, Vishnukumar Kadhirvelu, Simon Kay, Manish Kumar, Ankur Lathi, Rasko Leinonen, Fabio Madeira, Nandana Madhusoodanan, Milena Mansurova, Colman O'Cathail, Matt Pearce, Stephane Pesant, Nadim Rahman, Jeena Rajan, Gabriele Rinck, Sandeep Selvakumar, Alexey Sokolov, Swati Suman, Ross Thorne, Prabhat Totoo, Senthilnathan Vijayaraja, Zahra Waheed, Ahmad Zyoud, Rodrigo Lopez, Tony Burdett, Guy Cochrane

Summary: The European Nucleotide Archive, maintained at EMBL-EBI, offers free services for deposition and access to open nucleotide sequencing data, playing a crucial role in advancing scientific research.

NUCLEIC ACIDS RESEARCH (2022)

Article Multidisciplinary Sciences

Multilateral benefit-sharing from digital sequence information will support both science and biodiversity conservation

Amber Hartman Scholz, Jens Freitag, Christopher H. C. Lyal, Rodrigo Sara, Martha Lucia Cepeda, Ibon Cancio, Scarlett Sett, Andrew Lee Hufton, Yemisrach Abebaw, Kailash Bansal, Halima Benbouza, Hamadi Iddi Boga, Sylvain Brisse, Michael W. Bruford, Hayley Clissold, Guy Cochrane, Jonathan A. Coddington, Anne-Caroline Deletoille, Felipe Garcia-Cardona, Michelle Hamer, Raquel Hurtado-Ortiz, Douglas W. Miano, David Nicholson, Guilherme Oliveira, Carlos Ospina Bravo, Fabian Rohden, Ole Seberg, Gernot Segelbacher, Yogesh Shouche, Alejandra Sierra, Ilene Karsch-Mizrachi, Jessica da Silva, Desiree M. Hautea, Manuela da Silva, Mutsuaki Suzuki, Kassahun Tesfaye, Christian Keambou Tiambo, Krystal A. Tolley, Rajeev Varshney, Maria Mercedes Zambrano, Joerg Overmann

Summary: Open access to sequence data is crucial for biology and biodiversity research, but it has caused tension under the United Nations Convention on Biological Diversity (CBD). Finding a practical solution to ensure international benefit-sharing without jeopardising open sharing is a major challenge for the CBD and other UN negotiations.

NATURE COMMUNICATIONS (2022)

Article Multidisciplinary Sciences

Cryptic and abundant marine viruses at the evolutionary origins of Earth's RNA virome

Ahmed A. Zayed, James M. Wainaina, Guillermo Dominguez-Huerta, Eric Pelletier, Jiarong Guo, Mohamed Mohssen, Funing Tian, Akbar Adjie Pratama, Benjamin Bolduc, Olivier Zablocki, Dylan Cronin, Lindsey Solden, Erwan Delage, Adriana Alberti, Jean-Marc Aury, Quentin Carradec, Corinne da Silva, Karine Labadie, Julie Poulain, Hans-Joachim Ruscheweyh, Guillem Salazar, Elan Shatoff, Ralf Bundschuh, Kurt Fredrick, Laura S. Kubatko, Samuel Chaffron, Alexander Culley, Shinichi Sunagawa, Jens H. Kuhn, Patrick Wincker, Matthew B. Sullivan

Summary: This study expands Earth's RNA virus catalogs and their taxonomy, investigates their evolutionary origins and marine biogeography, and reveals the need for substantive revisions of taxonomy for RNA viruses. The efforts provide foundational knowledge critical to integrating RNA viruses into ecological and epidemiological models.

SCIENCE (2022)

Article Biology

Unifying the known and unknown microbial coding sequence space

Chiara Vanni, Matthew S. Schechter, Silvia G. Acinas, Albert Barberan, Pier Luigi Buttigieg, Emilio O. Casamayor, Tom O. Delmont, Carlos M. Duarte, A. Murat Eren, Robert D. Finn, Renzo Kottmann, Alex Mitchell, Pablo Sanchez, Kimmo Siren, Martin Steinegger, Frank Oliver Gloeckner, Antonio Fernandez-Guerra

Summary: Genes of unknown function pose a major challenge in molecular biology, especially in microbial systems. This study presents a computational framework to bridge the gap between known and unknown genes, and provides valuable insights into the diversity and relevance of the unknown fraction. The findings highlight the importance of investigating unknown genes and their potential implications in various organisms and environments.

ELIFE (2022)

Article Biochemistry & Molecular Biology

The Ocean Gene Atlas v2.0: online exploration of the biogeography and phylogeny of plankton genes

Caroline Vernette, Julien Lecubin, Pablo Sanchez, Shinichi Sunagawa, Tom O. Delmont, Silvia G. Acinas, Eric Pelletier, Pascal Hingamp, Magali Lescot

Summary: Testing hypothesis about the biogeography of genes requires significant hardware resources and programming skills. The new release of 'Ocean Gene Atlas' (OGA2) is a freely available online service to mine large and complex marine environmental genomic databases.

NUCLEIC ACIDS RESEARCH (2022)

Article Biology

Genomic evidence for global ocean plankton biogeography shaped by large-scale current systems

Daniel J. Richter, Romain Watteaux, Thomas Vannier, Jade Leconte, Paul Fremont, Gabriel Reygondeau, Nicolas Maillet, Nicolas Henry, Gaetan Benoit, Ophelie Da Silva, Tom O. Delmont, Antonio Fernandez-Guerra, Samir Suweis, Romain Narci, Cedric Berney, Damien Eveillard, Frederick Gavory, Lionel Guidi, Karine Labadie, Eric Mahieu, Julie Poulain, Sarah Romac, Simon Roux, Celine Dimier, Stefanie Kandels, Marc Picheral, Sarah Searson, Tara Oceans Coordinators, Stephane Pesant, Jean-Marc Aury, Jennifer R. Brum, Claire Lemaitre, Eric Pelletier, Peer Bork, Shinichi Sunagawa, Fabien Lombard, Lee Karp-Boss, Chris Bowler, Matthew B. Sullivan, Eric Karsenti, Mahendra Mariadassou, Ian Probert, Pierre Peterlongo, Patrick Wincker, Colomban de Vargas, Maurizio Ribera D'Alcala, Daniele Iudicone, Olivier Jaillon

Summary: This study assesses the global structure of plankton geography by analyzing metagenomes of plankton communities sampled from oceans worldwide. The findings demonstrate the influence of ocean currents on plankton biogeography and reveal characteristic timescales of community dynamics.

ELIFE (2022)

Editorial Material Biochemistry & Molecular Biology

Two-eyed seeing: Embracing the power of Indigenous knowledge for a healthy and sustainable Ocean

Kelsey Leonard, Pier Luigi Buttigieg, Maui Hudson, Kenneth Paul, Jay Pearlman, S. Kim Juniper

Summary: Indigenous knowledge is often overlooked, resulting in missed opportunities for positive change. A two-eyed seeing approach that combines Indigenous and western knowledge systems can empower coastal Indigenous Peoples and bring advancements in protecting the Ocean.

PLOS BIOLOGY (2022)

Article Biochemistry & Molecular Biology

The European Nucleotide Archive in 2022

Josephine Burgin, Alisha Ahamed, Carla Cummins, Rajkumar Devraj, Khadim Gueye, Dipayan Gupta, Vikas Gupta, Muhammad Haseeb, Maira Ihsan, Eugene Ivanov, Suran Jayathilaka, Vishnukumar Balavenkataraman Kadhirvelu, Manish Kumar, Ankur Lathi, Rasko Leinonen, Milena Mansurova, Jasmine McKinnon, Colman O'Cathail, Joana Pauperio, Stephane Pesant, Nadim Rahman, Gabriele Rinck, Sandeep Selvakumar, Swati Suman, Senthilnathan Vijayaraja, Zahra Waheed, Peter Woollard, David Yuan, Ahmad Zyoud, Tony Burdett, Guy Cochrane

Summary: The European Nucleotide Archive (ENA) is an open and supported platform for data management, archiving, publication, and dissemination. It provides comprehensive data sets and tools for data discovery and retrieval. Recent updates have focused on improving connectivity, reusability, and interoperability of ENA data and metadata.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

MGnify: the microbiome sequence data analysis resource in 2023

Lorna Richardson, Ben Allen, Germana Baldi, Martin Beracochea, Maxwell L. Bileschi, Tony Burdett, Josephine Burgin, Juan Caballero-Perez, Guy Cochrane, Lucy J. Colwell, Tom Curtis, Alejandra Escobar-Zepeda, Tatiana A. Gurbich, Varsha Kale, Anton Korobeynikov, Shriya Raj, Alexander B. Rogers, Ekaterina Sakharova, Santiago Sanchez, Darren J. Wilkinson, Robert D. Finn

Summary: The MGnify platform is a resource for analyzing and storing microbiome-derived nucleic acid sequences. It offers access to taxonomic assignments and functional annotations for a large number of datasets derived from different environments. The platform has expanded in terms of dataset quantity and analysis capabilities over the past three years, and includes a relational database for understanding the genomic context of proteins. Deep learning-based annotation methods have also been implemented to enhance functional annotations. Additionally, the platform's technology has been upgraded, and a Jupyter Lab environment has been introduced for downstream analysis of the data.

NUCLEIC ACIDS RESEARCH (2023)

Article Biochemistry & Molecular Biology

MGnify Genomes: A Resource for Biome-specific Microbial Genome Catalogues

Tatiana A. Gurbich, Alexandre Almeida, Martin Beracochea, Tony Burdett, Josephine Burgin, Guy Cochrane, Shriya Raj, Lorna Richardson, Alexander B. Rogers, Ekaterina Sakharova, Gustavo A. Salazar, Robert D. Finn

Summary: An increasing number of shotgun metagenomic datasets now yield metagenome-assembled genomes (MAGs), but the lack of standardization in their generation, annotation, and storage hinders the discovery and comparison of MAG collections. To address this, MGnify Genomes offers a growing collection of biome-specific non-redundant microbial genome catalogues generated using MAGs and publicly available isolate genomes. Users can access visualized species representative sequences and annotations on the MGnify website and download the full catalogue and associated analysis outputs from MGnify servers. Currently, there are seven available biomes with over 300,000 genomes representing 11,048 non-redundant species and including 36 taxonomic classes not represented by cultured genomes. MGnify Genomes is accessible at https://www.ebi.ac.uk/metagenomics/browse/genomes/.

JOURNAL OF MOLECULAR BIOLOGY (2023)

Article Geosciences, Multidisciplinary

A database of marine macronutrient, temperature and salinitymeasurements made around the highly productive island of South Georgia, theScotia Sea and the Antarctic Peninsula between 1980 and 2009

Michael J. Whitehouse, Katharine R. Hendry, Geraint A. Tarling, Sally E. Thorpe, Petra Ten Hoopen

Summary: We have created a database of macronutrient data obtained from 20 oceanographic cruises conducted primarily around South Georgia and the Scotia Sea. The database includes measurements of nutrients such as silicate, phosphate, nitrate, ammonium, and nitrite, along with temperature and salinity data. This comprehensive dataset provides valuable information for studying the ecology of the Southern Ocean and its surrounding regions.

EARTH SYSTEM SCIENCE DATA (2023)

Article Biochemistry & Molecular Biology

The European Nucleotide Archive in 2023

David Yuan, Alisha Ahamed, Josephine Burgin, Carla Cummins, Rajkumar Devraj, Khadim Gueye, Dipayan Gupta, Vikas Gupta, Muhammad Haseeb, Maira Ihsan, Eugene Ivanov, Suran Jayathilaka, Vishnukumar Balavenkataraman Kadhirvelu, Manish Kumar, Ankur Lathi, Rasko Leinonen, Jasmine McKinnon, Lili Meszaros, Colman O'Cathail, Dennis Ouma, Joana Pauperio, Stephane Pesant, Nadim Rahman, Gabriele Rinck, Sandeep Selvakumar, Swati Suman, Yanisa Sunthornyotin, Marianna Ventouratou, Senthilnathan Vijayaraja, Zahra Waheed, Peter Woollard, Ahmad Zyoud, Tony Burdett, Guy Cochrane

Summary: The European Nucleotide Archive (ENA) is a database maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) that provides services for the submission, processing, archiving, and dissemination of sequence data. Recent progress and improvements to ENA services include enhancing the FAIRness of data, focusing on pandemic preparedness and foundational technology, and supporting genomic surveillance efforts.

NUCLEIC ACIDS RESEARCH (2023)

Article Computer Science, Artificial Intelligence

Polar biodiversity data: From a national marine platform to a global data portal

Petra Ten Hoopen, Helen J. Peat, Peter Ward, Geraint A. Tarling

Summary: Global access to accurate biodiversity data is crucial for understanding biodiversity dynamics. However, the challenge remains in guiding data systematically from its source to end users. This paper describes the biodiversity data flow from a polar ship to a national data repository and a global data portal. The flexible workflow can be adapted for other data types and repositories.

PATTERNS (2022)

Article Engineering, Ocean

The Ocean Biomolecular Observing Network (OBON)

Margaret Leinen, Francisco Chavez, Raissa Meyer, Pier Luigi Buttigieg, Neil Davies, Raffaella Casotti, Astrid Fischer

Summary: This article introduces the development of the Ocean Biomolecular Observing Network (OBON) and its importance in understanding and protecting marine ecosystems. OBON aims to transform the way we sense, utilize, protect, and manage ocean life using molecular techniques, and contributes to the detection of biological hazards and next-generation ocean observing systems.

MARINE TECHNOLOGY SOCIETY JOURNAL (2022)

No Data Available