Overview of the NMDC bioinformatic workflows
Part of the NMDC effort is to provide standardized bioinformatic workflows which will afford more comparable results from diverse data sets.
The NMDC workflows incorporate tools to process all types of omics data:
- Reads QC provides quality assessment/quality control for raw Illumina data for metagenomes and produces “clean” reads.
- Read-based Analysis takes single- or paired-end sequencing data and profiles them using multiple taxonomy classification tools.
- Metagenome Assembly takes paired-end Illumina data, runs error correction, assembly, and assembly validation.
- Metagenome Annotation takes assembled metagenomes and generates structural and functional annotations.
- Metagenome Assembled Genomes takes assembled metagenomes and classifies contigs into bins; the bins are evaluated for quality and lineages are assigned to high- and medium-quality bins.
- Metatranscriptomics takes transcriptomic data, removes rRNA reads, and assembles the transcripts; then the non-rRNA reads, assembled transcripts and the functional annotation file are used to generate RPKMs for each feature in the annotation file.
- Metaproteomics processes MS/MS (mass spec/mass spec) data for protein identification and characterization from the metagenome
- Metabolomics processes GC-MS (gas chromatography-mass spectrometry) data to identify and characterize metabolites from the metagenome.
Run the NMDC workflows
Download and install natively to an institutional system
- the WDL files for each workflow from GitHub.
- the images with the required third-party tools for each workflow from Docker Hub.
- the required databases for each workflow.
See the full NMDC Workflows documentation for more information, including required databases and resources needed for native installation.