Back
Oct. 16, 2023 | News

Coordinating across data systems to improve sample tracking for interdisciplinary biological and environmental research

Fig. Tracking and linking a source material sample from the WHONDRS project (Stegen and Goldman 2018) based on the iSample relational data model, which links related samples based on entities like project, sampling site, subsamples, and other related links (sample collections, dataset, journal publication, analytical results for specific data types). Credit: Joan Damerow

Physical samples are often used to enhance models and predictions of ecological processes and biogeochemical responses to contamination, warming, and other disturbances. A common workflow is to collect a physical sample (like soil or water), and then send it out for a combination of microbial metagenomic analyses and a variety of physical/chemical analyses. Many DOE funded interdisciplinary projects within the DOE‘s Office of Science Biological & Environmental Research (BER) program have faced many challenges in tracking samples and resulting data that were sent to numerous collaborators and labs. How can you keep track of these samples after they are subjected to a variety of analyses, and then compiled, analyzed, and published in numerous files and data systems? 

To meet this challenge, representatives across BER data systems dealing with environmental and biological samples (i.e., NMDC, ESS-DIVE, JGI, KBase and EMSL) have coordinated to test a standard approach (i.e. iSamples relational data model; Davies et al. 2021) for linking related samples and their associated metadata, sample sets, workflows, datasets, and project(s). The purpose of this work was to outline standard identifiers, metadata, and tools that would enable automated cross-linking across systems and exchanging relevant information to improve interdisciplinary sample data discovery, synthesis, and reuse. 

Team members at ESS-DIVE are collaborating with the NMDC to test the sample metadata validation tool used by NMDC (Data Harmonizer) with the goal of incorporating the ESS-DIVE sample ID and metadata reporting format into the NMDC sample submission portal; this would make it easier to submit Environmental System Science samples with International Generic Sample Numbers (IGSNs), to NMDC, JGI, and/or EMSL. 

Coordination also included adding links on relevant landing pages across these systems and some others outside of BER. For example, on the ESS-DIVE dataset landing page, ESS-DIVE team members added original source sample IGSNs from the System for Earth Same Registration (SESAR, in the dataset methods section); JGI award DOIs; NMDC Study Page; associated journal publications (related references section); related datasets (related references section). Additionally, on the NMDC Study pages, NMDC team members added the ESS-DIVE dataset; JGI award DOIs; and sequencing projects.

This use case work will continue to inform proposals, publications, and our implementation of new tools to efficiently integrate across BER systems. 

The ESS-DIVE Open Data Workshop is coming up on November 15 — 16, and NMDC team members are participating in a special session and collaborative paper opportunity to outline new science use cases and needs for linking and integrating related Environmental System Science project data across BER data systems. The session and paper will broaden the use case work on samples to include cross-linking and metadata exchange needed across a variety of datasets and other research outputs across BER data systems that our scientists are using (e.g. NMDC, ESS-DIVE, JGI, KBase, ARM, ESGF).  

References

Damerow, Joan E., Charuleka Varadharajan, Kristin Boye, Eoin L. Brodie, Madison Burrus, K. Dana Chadwick, Robert Crystal-Ornelas, et al. 2021. “Sample Identifiers and Metadata to Support Data Management and Reuse in Multidisciplinary Ecosystem Sciences.” Data Science Journal 20 (1): 11.

Davies, Neil, John Deck, Eric C. Kansa, Sarah Whitcher Kansa, John Kunze, Christopher Meyer, Thomas Orrell, et al. 2021. “Internet of Samples (iSamples): Toward an Interdisciplinary Cyberinfrastructure for Material Samples.” GigaScience 10 (5). https://doi.org/10.1093/gigascience/giab028.

Stegen, James C., and Amy E. Goldman. 2018. “WHONDRS: A Community Resource for Studying Dynamic River Corridors.” mSystems 3 (5): e00151–18.

Join our vision

Want more info? Or to be an NMDC Champion? Subscribe to be the first to know about the latest news and developments.

Thank you for your interest
Please be sure to check your inbox for the latest news, updates, and information.