Standardized Bioinformatics Workflows

The NMDC provides accessible standardized workflows for processing microbiome omics data

 

Run the NMDC workflows in NMDC EDGE

The workflows are also available through DockerHub and GitHub

Motivation

Microbiome datasets are often processed with different tools and pipelines, which presents challenges for reuse and cross-study comparisons. To address these challenges, the NMDC integrates production quality, open-source bioinformatics tools into accessible standardized workflows for processing omics data (e.g., metagenome, metatranscriptome, metaproteome, and metabolome data) to produce interoperable and reusable annotated data products. These workflows further the NMDC’s commitment to the FAIR data principles.

The NMDC workflows include bioinformatics tools developed by the Joint Genome Institute (JGI), Environmental Molecular Sciences Laboratory (EMSL), and Los Alamos National Laboratory (LANL), among others. The workflows developed and used in production at JGI and EMSL are used to process thousands of datasets annually and have been extensively benchmarked to ensure the generation of high-quality data. 

The NMDC offers a user-friendly web interface, NMDC EDGE, where users can input their microbiome data, run the standardized NMDC workflows, and obtain summaries, output files, and visualizations of their results. NMDC EDGE is run using NMDC computing resources, and is free and open for researchers to process their omics data. 

The NMDC workflows are also publicly available through GitHub and DockerHub as standalone, containerized workflows, offering a unique opportunity for any institute or individual to obtain, install, and run the workflows in their own environments. 

 

      Metagenome workflow

      The NMDC metagenome pipeline includes reads QC, read-based taxonomy classification, metagenome assembly, metagenome annotation, and metagenome assembled genomes (MAGs) workflows

      Metatranscriptome workflow

      Takes in raw metatranscriptome data, filters reads for quality, removes rRNA reads, then assembles and annotates transcripts

      The NMDC Standardized Bioinformatics Workflows

      Metabolome workflow

      The GC-MS based workflow performs signal noise reduction, m/z based Chromatogram Peak Deconvolution, abundance threshold calculation, peak picking, spectral similarity calculation and molecular search, similarity score calculation, and confidence filtering

      Metaproteome workflow

      An end-to-end data processing workflow for protein identification and characterization using MS/MS data

      Natural Organic Matter workflow

      Takes FT-ICR mass spectrometry data collected from organic extracts to determine molecular formulas of natural organic biomolecules in the sample

      Viruses & Plasmids workflow

      Takes in assembly files, generates list of viruses and plasmids that were detected, and provides quality and confidence information

      The NMDC Metagenome Workflow

      Documentation

      The NMDC documentation provides additional information on each workflow, their standardized parameters, any associated databases, versions, and the tools associated with each workflow.

      Tutorials

      Tutorial videos and user guides are available in the ‘Tutorials’ section of NMDC EDGE. User guides are available in English, Spanish, and French. These walk users through how to run each of the NMDC workflows in NMDC EDGE, and explain required input file formats and the outputs that are generated.

      Feedback

      For troubleshooting and questions about the workflows, users can email the team directly at nmdc-edge@lanl.gov with their issue and project name. To provide comments about the workflows that do not need follow-up from the team, a Google form is linked on the home page of the NMDC EDGE website. Our team works to incorporate as much user feedback as possible and we highly value user input and suggestions. Beta testing opportunities are also announced through the NMDC newsletter, “The Microbiome Standard”, on the NMDC User Research webpage, and communications channels such as the NMDC Community Slack. To participate, sign up here and the team will reach out when a beta testing round opens. 

      Thank you for your interest
      Please be sure to check your inbox for the latest news, updates, and information.