The NMDC metadata standards
Enhancing and linking existing community metadata standards.
What are metadata?
Metadata are the who, what, when, where, and why of experimental data. Metadata are critical for the sharing and reuse of scientific data.
Learn what the NMDC is doing for metadata
Interested in how the NMDC is using community-standards? Want to start applying standards to your sample metadata?
The development and adoption of standards is key to the success of the NMDC, and key to being able to integrate and search across data derived from different studies and stored in distributed data resources. Each database represents samples and associated information differently, so mapping the relevant parts of these database schemas to the relevant standards is crucial. This work focuses on selecting and contributing to existing community-driven standards and resources (i.e. Genomic Standards Consortium (GSC) Minimum Information about any (x) Sequence (MIxS), Environment Ontology (EnvO), and the Joint Genome Institute (JGI) Genomes Online Database (GOLD) creating a framework for using these standards, and mapping existing systems to these standards.
Evaluate and establish metadata standards for NMDC data sets, by enhancing and reusing existing community standards.
The NMDC Metadata Standards Tasks
The NMDC team will implement the activities outlined here with the aim of improving adoption and interoperability of metadata standards.
Enrich sample metadata with existing environmental metadata from existing climate and environment databases at given location and time range, by use of the Oak Ridge National Laboratory Identify tool.
Prototype a microbiome metadata submission portal with the ability to capture complex sample relationships and full provenance.
Evaluate compliance with FAIR principles by utilizing the FAIRness evaluation framework and create FAIRness Maturity Indicators (MIs) and Compliance Tests for microbiome data.
Community collaborations to enhance existing standards
- The MIxS templates were previously managed and distributed as Excel files, which limited automated compliance checks
- Contributed terms to EnvO (available in the latest EnvO release)
- Extension of the GOLD schema to include EnvO
Creation of a FAIR NMDC schema that leverages these standards
Integration of metadata from sequencing and other omics projects
- Mapped and linked existing sample metadata between GOLD and EnvO terms for 963 BioSamples with multi-omics data from EMSL and JGI.
Example: Shows how terms from the EnvO triad and GOLD hierarchy are used to describe a permafrost soil sample.
Innovative methods for enhancement and enrichment of metadata
- Assessed layer mapping to EnvO ontology
- Defined use cases for metadata enrichment API