Overview of the NMDC Metadata Standards
All data integrated into the NMDC data portal must meet appropriate metadata standards for proper indexing and display, and to ensure accurate search results are returned. The NMDC leverages sample and data processing metadata to drive search and discovery of data.
This page provides an overview of standards and ontologies used by the NMDC. Please refer to the full NMDC Metadata Documentation for more information.
The NMDC sample metadata leverages existing community-driven standards developed by the Genomics Standards Consortium (GSC), the Joint Genome Institute (JGI) Genomes Online Database (GOLD), and OBO Foundry’s Environmental Ontology (EnvO). In collaboration with these organizations, the NMDC has created a framework for mapping these standards into an interoperable framework that can be expanded to include additional standards and ontologies in the future.
GSC Minimum Information about any (x) Sequence (MIxS)
The GSC has developed standards for describing genomic and metagenomic sequences, and the environment from which a biological sample originates. These “Minimum Information about any (x) Sequence” (MIxS) packages provides standardized sample descriptors (e.g., location, environment, elevation, altitude, depth, etc.) for 17 different sample environments.
Genomes Online Database (GOLD)
GOLD is an open-access repository of genome, metagenome, and metatranscriptome sequencing projects with their associated metadata. Biosamples (defined as the physical material collected from an environment) are described using a five-level ecosystem classification path that goes from ecosystem down to the type of environmental material that describes the sample.
Environmental Ontology (EnvO)
EnvO is a community-led ontology that represents environmental entities such as biomes, environmental features, and environmental materials. These EnvO entities are the recommended values for several of the mandatory terms in the MIxS packages, often referred to as the “MIxS triad”.
Data Processing Metadata
In addition, the NMDC is adopting the MIxS standards for sequence data types (e.g., sequencing method, pcr primers and conditions, etc.), and are building on previous efforts by the Proteomics Standards Initiative and Metabolomics Standards Initiative to develop standards and controlled vocabularies for mass spectrometry data types (e.g., ionization mode, mass resolution, scan rate, etc.). Additional details on the processing metadata are coming soon.