Taxonomic profiling using mOTUs

Tutorials

Generate taxonomic profiles using MGs

Simple (single sample) execution

We will analyse a sample consisting of a single read file sample.fq.gz.

1. Download the mOTUs.pl script for your platform.

2. Expand the tarball (this step, naturally, only needs to be performed once):

tar czf mOTUs.Linux64bits.tar.gz

3. This will expand to a single file, mOTUs.pl, which is a perl script which can now be applied to your sample:

perl mOTUs.pl sample.fq.gz

Note: The first time you run the mOTUs.pl script, it will create a directory, motus_data, to where it will expand all the needed dependencies.

The mOTU profiles are saved in the following folder:

motu.profiles/mOTU.v1.padded/

AND, easy to use links are saved in the RESULTS folder

RESULTS/annotated.species.clusters.abundances.tab.gz
RESULTS/species.clusters.abundances.tab.gz

Inside the motu.profiles folder the file ending with .insert.mm.dist.among.unique.scaled.mOTU.gz are number of inserts matching the different COGs.

From these, .tab files are generated. These files contain the number of inserts mapping to annotated species cluster, and species clusters. The 'fractions' files has these abundances as fractions of the total sum. The fractions files are the files available (as symbolic links) inside the RESULTS folder.

The taxonomic profiles are saved in the following folder:

taxonomic.profiles/Ref10.v1.padded/

AND, an easy to use link is saved in the RESULTS folder

RESULTS/NCBI.species.abundances.tab.gz

Paired-end data

If your data is paired-end, consisting of files sample.1.fq.gz and sample.2.fq.gz, you can simply run:

perl motus.pl sample.1.fq.gz sample.2.fq.gz

The results will have exactly the same form as the single-end example above.

Multiple sample mode

To process multiple samples, organize them in the following way:

One sample per directory
Create a text file with the directory names (one directory per line)
Pass the name of that text file to the motus.pl script:

Now run (where <SAMPLE_FILE> should be replaced with the name of the file you created in step 2 above):

perl mOTUs.pl --sample-file <SAMPLE_FILE>

Note: This is the same organization is used by the MOCAT pipeline.

Options

The script accepts the following command line arguments (you can also access this information by running the script without any arguments: perl motus.pl):

processors: This should be an integer and defines the number of processors that the script will use.
length-cutoff: The minimum size per read (after quality-based trimming)
fastq-format: The format of the input files. Must be one of 'auto' (the default), 'sanger', or 'illumina'. Note that new Illumina machines actually use the 'sanger' format. The auto-detection should generally work well.
output-directory: Where to place the final results file (by default it uses a directory named RESULTS).
identity-cutoff: Minimum percentage identity in alignment (default: 97)
quality-cutoff: Basepair quality cutoff (default: 20)