geNomad is a tool that identifies and classifies MGEs based upon their gene content and their genetic sequences.
(Kent Leech for Berkeley Lab.)
The Science
Mobile genetic elements (MGEs) are genetic entities that seek to replicate themselves and spread from cell to cell. Two of the most common forms of MGEs are viruses and plasmids. They can be found in virtually all of Earth’s ecosystems. A software tool recently described in Nature Biotechnology called geNomad identifies and classifies MGEs based upon their gene content and their genetic sequences. The software was created by researchers under the direction of Microbiome Data Science Group Lead Nikos Kyrpides at the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory. geNomad is now accessible through the National Microbiome Data Collaborative’s EDGE platform.
The Impact
In order to gain more insights into Earth’s ecosystems, researchers must examine the interactions that happen between tiny organisms (microbes) located in soils and water. MGEs like viruses and plasmids drive microbial processes and evolution. This is because MGEs can affect a microbe’s ability to cycle nutrients or produce new chemicals. They can also affect ecosystems by killing other cells. Over time, MGEs can help microbes gain a competitive edge within an ecosystem. They influence the composition of the biogeochemical cycles around them. By understanding more about the genomics of MGEs and their evolution at the cellular level, scientists can better understand ecosystem-wide processes.
Summary
geNomad is an annotation and classification framework that combines and builds on two standard techniques for identifying viruses and plasmids. Until now, most other tools have focused on identifying only specific plasmids or viruses. geNomad combines a broad scope, targeting all known groups of viruses and plasmids. It is also optimized for speed, i.e., it can identify millions of new viruses and plasmids quickly, even in massive datasets.
geNomad employs two distinct approaches to identify both viruses and plasmids; one is marker gene-based, and the second is a neural network approach. The tool was employed to build version 4 of the JGI’s IMG Virus Resource (IMG/VR), now with more than 15 million viral genomes. It was also key to developing the first version of the IMG Plasmid Resource (IMG/PR), which currently has more than 700,000 plasmids from genomes, metagenomes and metatranscriptomes.
To lower barriers to accessing and using geNomad, the JGI partnered with the NMDC to integrate this tool into the NMDC EDGE platform. NMDC EDGE’s easy to use interface allows beginners and experienced users to import their data and access the geNomad tool to conduct their research.
Designed to reduce the effects of taxonomic representation biases during marker selection, geNomad identifies plasmids and viruses from underrepresented groups more accurately. Additionally, because it can process large datasets, geNomad is poised to become an essential tool for researching global viral diversity. geNomad has already been downloaded thousands of times, receiving excellent feedback from the general research community. It can be downloaded through NERSC.
Contacts
BER Contact
Ramana Madupu
Program Manager, Biological Systems Sciences Division
Office of Biological and Environmental Research
Department of Energy Office of Science
JGI Contact
Nikos Kyrpides
Microbiome Data Science Group Lead
DOE Joint Genome Institute
Funding
The U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), and the National Energy Research Scientific Computing Center (NERSC) (https://ror.org/05v3mvq14), are DOE Office of Science User Facilities operated under Contract No. DE-AC02-05CH11231. This work also received support from the Genomic Science Program in the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER) (89233218CNA000001 to L.A.N.L., DE-AC05-00OR22725 to O.R.N.L., DEAC05-76RL01830 to P.N.N.L.), and used computational resources from the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
The work conducted by the National Microbiome Data Collaborative (https://ror.org/05cwx3318) is supported by the Genomic Science Program in the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER) under contract numbers DE-AC02-05CH11231 (LBNL), 89233218CNA000001 (LANL), and DE-AC05-76RL01830 (PNNL).
Publication
Camargo, A.P., et al. “Identification of mobile genetic elements with geNomad,” Nature Biotechnology. (2023). doi: 10.1038/s41587-023-01953-y
Related Links
Join our vision
Want more info? Or to be an NMDC Champion? Subscribe to be the first to know about the latest news and developments.