mOTU Profiling

To profile a metagenomic sample, download the tool and follow the tutorial.

If you use this tool in a published work, please cite us.


Phylogenetic marker genes are suitable to reconstruct the evolutionary history of organisms and to profile the taxonomic composition of environmental samples. For this purpose, a set of 40 protein-coding phylogenetic marker genes (MGs) have been identified (Ciccarelli et al., Science, 2006; Sorek et al., Science, 2007; von Mering et al., Science, 2007). In the vast majority of known organisms, these 40 MGs occur in single copy and they have recently been used to delineate prokaryotic organisms at the species level (Mende et al., 2013). Due to these properties, they can be used to detect and accurately quantify not only known species, but also those that still lack genomic information. Based on a subset of these MGs that are suitable for shotgun sequencing data, we developed a method for taxonomic composition profiling of environmental samples (Sunagawa et al., 2013).

Here, we provide a tool that is available as a standalone software and is also implemented in MOCAT.


Species-level profiles are generated by mapping shotgun DNA sequencing reads from metagenomes to a pre-compiled database (mOTU.v1.padded) consisting of 10 MGs representing 3,445 prokaryotic reference genomes and unknown species reconstructed from 263 publicly available metagenomes (from the MetaHIT and HMP projects). Profiles can also be generated for several NCBI-compliant taxonomic levels and specI clusters (see Mende et al., 2013) using a database that contains MGs only from reference genomes (RefMG.v1.padded). How to generate such profiles is described in the tutorial.

Extraction of MGs from sequence data

To generate these databases, we developed a small tool named fetchMG, which extracts the 40 MGs from genomes and metagenomes. It uses Hidden Markov Models trained on protein alignments of the 40 MGs (available at the eggNOG database) and calibrated cutoffs for each of the 40 MGs. fetchMG is available as a stand-alone tool and also implemented in MOCAT.