Overview of the NMDC bioinformatic workflows

Part of the NMDC effort is to provide standardized bioinformatic workflows which will afford more comparable results from diverse data sets.

The NMDC workflows incorporate tools to process all types of omics data:

Reads QC provides quality assessment/quality control for raw Illumina data for metagenomes and produces “clean” reads.
Read-based Taxonomy Classification takes single- or paired-end sequencing data and profiles them using multiple taxonomy classification tools.
Metagenome Assembly takes paired-end Illumina data, runs error correction, assembly, and assembly validation.
Metagenome Annotation takes assembled metagenomes and generates structural and functional annotations.
Metagenome Assembled Genomes takes assembled metagenomes and classifies contigs into bins; the bins are evaluated for quality and lineages are assigned to high- and medium-quality bins.

Metatranscriptomics takes transcriptomic data, removes rRNA reads, and assembles the transcripts; then the non-rRNA reads, assembled transcripts and the functional annotation file are used to generate RPKMs for each feature in the annotation file.

Metaproteomics processes MS/MS (mass spec/mass spec) data for protein identification and characterization from the metagenome

Metabolomics processes GC-MS (gas chromatography-mass spectrometry) data to identify and characterize metabolites from the metagenome.

NOM processes DI FT-MS (Direct Infusion Fourier Transform mass spectrometry) data to identify molecular formulas for compounds in the sample with a confidence scores assigned to each formula.

Run the NMDC workflows

NMDC EDGE provides the workflows with a user-friendly GUI.
- All Metagenomic and Natural Organic Matter workflows are currently available.
- Other workflows are in preparation and will be available in a future release.
NMDC workflows are being incorporated into KBase.

the WDL files for each workflow from GitHub.
the images with the required third-party tools for each workflow from Docker Hub.
the required databases for each workflow.

See the full NMDC Workflows documentation for more information, including required databases and resources needed for native installation.