Fig. Tracking and linking a source material sample from the WHONDRS project (Stegen and Goldman 2018) based on the iSample relational data model, which links related samples based on entities like project, sampling site, subsamples, and other related links (sample collections, dataset, journal publication, analytical results for specific data types). Credit: Joan Damerow
Physical samples are often used to enhance models and predictions of ecological processes and biogeochemical responses to contamination, warming, and other disturbances. A common workflow is to collect a physical sample (like soil or water), and then send it out for a combination of microbial metagenomic analyses and a variety of physical/chemical analyses. Many DOE funded interdisciplinary projects within the DOE‘s Office of Science Biological & Environmental Research (BER) program have faced many challenges in tracking samples and resulting data that were sent to numerous collaborators and labs. How can you keep track of these samples after they are subjected to a variety of analyses, and then compiled, analyzed, and published in numerous files and data systems?
To meet this challenge, representatives across BER data systems dealing with environmental and biological samples (i.e., NMDC, ESS-DIVE, JGI, KBase and EMSL) have coordinated to test a standard approach (i.e. iSamples relational data model; Davies et al. 2021) for linking related samples and their associated metadata, sample sets, workflows, datasets, and project(s). The purpose of this work was to outline standard identifiers, metadata, and tools that would enable automated cross-linking across systems and exchanging relevant information to improve interdisciplinary sample data discovery, synthesis, and reuse.
Team members at ESS-DIVE are collaborating with the NMDC to test the sample metadata validation tool used by NMDC (Data Harmonizer) with the goal of incorporating the ESS-DIVE sample ID and metadata reporting format into the NMDC sample submission portal; this would make it easier to submit Environmental System Science samples with International Generic Sample Numbers (IGSNs), to NMDC, JGI, and/or EMSL.
Coordination also included adding links on relevant landing pages across these systems and some others outside of BER. For example, on the ESS-DIVE dataset landing page, ESS-DIVE team members added original source sample IGSNs from the System for Earth Same Registration (SESAR, in the dataset methods section); JGI award DOIs; NMDC Study Page; associated journal publications (related references section); related datasets (related references section). Additionally, on the NMDC Study pages, NMDC team members added the ESS-DIVE dataset; JGI award DOIs; and sequencing projects.
This use case work will continue to inform proposals, publications, and our implementation of new tools to efficiently integrate across BER systems.
The ESS-DIVE Open Data Workshop is coming up on November 15 — 16, and NMDC team members are participating in a special session and collaborative paper opportunity to outline new science use cases and needs for linking and integrating related Environmental System Science project data across BER data systems. The session and paper will broaden the use case work on samples to include cross-linking and metadata exchange needed across a variety of datasets and other research outputs across BER data systems that our scientists are using (e.g. NMDC, ESS-DIVE, JGI, KBase, ARM, ESGF).
References
Join our vision
Want more info? Or to be an NMDC Champion? Subscribe to be the first to know about the latest news and developments.