The NMDC Metadata Standards

Linking and enhancing community-driven metadata standards

Motivation

The development and adoption of standards is key to the success of the NMDC, and key to being able to integrate and search across data derived from different studies and stored in distributed data resources. Each database represents samples and associated information differently, so mapping the relevant parts of these database schemas to the relevant standards is crucial. This work focuses on selecting and contributing to existing community-driven standards (i.e. Genomic Standards Consortium (GSC) Minimum Information about any (x) Sequence (MIxS), Environment Ontology (EnvO)), creating a framework for using these standards, and mapping existing systems to these standards.

establish standards

Evaluate and establish metadata standards for NMDC data sets, by enhancing and reusing existing community standards.

Link standards

Map EnvO to related vocabularies and terminologies (Genomes Online Database (GOLD), Environmental Molecular Sciences Laboratory (EMSL)).

The NMDC Metadata Standards Tasks

The NMDC team will implement the activities outlined here with the aim of improving adoption and interoperability of metadata standards.

automated enrichment

Enrich sample metadata with existing environmental metadata from existing climate and environment databases at given location and time range, by use of the Oak Ridge National Laboratory Identify tool.

Coordinated development

Management, coordination, and versioning of different metadata schemas and templates within NMDC and with collaborators (Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE), National Ecological Observatory Network (NEON)).

Submission portal

Prototype a microbiome metadata submission portal with the ability to capture complex sample relationships and full provenance.

FAIR Compliance

Evaluate compliance with FAIR principles by utilizing the FAIRness evaluation framework and create FAIRness Maturity Indicators (MIs) and Compliance Tests for microbiome data.

Accomplishments

Community collaborations to enhance existing standards

  • Developed a machine-readable GSC MIxS format in JavaScript Object Notation (JSON) schema in collaboration with the GSC Compliance and Interoperability Group (CIG)
    • The MIxS templates were previously managed and distributed as Excel files, which limited automated compliance checks
  • Contributed terms to EnvO (available in the latest EnvO release)
    • Extension of the GOLD schema to include EnvO

Creation of a FAIR NMDC schema that leverages these standards

Integration of metadata from sequencing and other omics projects

  • Mapped and linked existing sample metadata between GOLD and EnvO terms for 963 BioSamples with multi-omics data from EMSL and JGI.


Example: Shows how terms from the EnvO triad and GOLD hierarchy are used to describe a permafrost soil sample.

Innovative methods for enhancement and enrichment of metadata

  • Assessed layer mapping to EnvO ontology
  • Defined use cases for metadata enrichment API
Thank you for your interest
Please be sure to check your inbox for the latest news, updates, and information.