Standardized Bioinformatics Workflows

The NMDC supports standardized workflows for processing microbiome omics data

Access through DockerHub

Containerized software to run multi-omics workflows.

Access through GitHub

Workflow code available in Workflow Description Language (WDL).

Microbiome datasets are often processed with different tools and pipelines, which presents challenges for reuse and cross-study comparisons. To address these challenges, the NMDC integrates production quality, open-source bioinformatics tools into accessible standardized workflows for processing omics data (e.g., metagenome, metatranscriptome, metaproteome, and metabolome data) to produce interoperable and reusable annotated data products. These workflows further the NMDC’s commitment to the FAIR data principles.

The NMDC workflows include bioinformatics tools developed by the Joint Genome Institute (JGI) and Environmental Molecular Sciences Laboratory (EMSL), among others. The workflows developed and used in production at JGI and EMSL are used to process thousands of datasets annually and have been extensively benchmarked to ensure the generation of high-quality data.

The NMDC workflows are publicly available through GitHub and DockerHub as standalone, containerized workflows, offering a unique opportunity for any institute or individual to obtain, install, and run the workflows in their own environments.

Metagenome workflow

The NMDC metagenome pipeline includes reads QC, read-based taxonomy classification, metagenome assembly, metagenome annotation, and metagenome assembled genomes (MAGs) workflows

Metatranscriptome workflow

Takes in raw metatranscriptome data, filters reads for quality, removes rRNA reads, then assembles and annotates transcripts

The NMDC Standardized Bioinformatics Workflows

To date, the standardized production-quality workflows developed by the JGI and EMSL have largely been limited to the facilities for which they were developed. We provide access to these standardized workflows to process multi-omics data and produce interoperable annotated data.

Metabolome workflow

The GC-MS based workflow performs signal noise reduction, m/z based Chromatogram Peak Deconvolution, abundance threshold calculation, peak picking, spectral similarity calculation and molecular search, similarity score calculation, and confidence filtering

Metaproteome workflow

An end-to-end data processing workflow for protein identification and characterization using MS/MS data

Natural Organic Matter workflow

Takes FT-ICR mass spectrometry data collected from organic extracts to determine molecular formulas of natural organic biomolecules in the sample

Lipidomics workflow

Liquid chromatography-mass spectrometry (LC-MS)-based lipidomics workflow that provides annotated lipid information

Documentation

The NMDC documentation provides additional information on each workflow, their standardized parameters, any associated databases, versions, and the tools associated with each workflow.

Feedback

Our team works to incorporate as much user feedback as possible and we highly value user input and suggestions. Beta testing opportunities are also announced through the NMDC newsletter, “The Microbiome Standard”, on the NMDC User Research webpage, and communications channels such as the NMDC Community Slack. To participate, sign up here and the team will reach out when a beta testing round opens.