Tutorials

Generate taxonomic profiles using MGs

Simple (single sample) execution

We will analyse a sample consisting of a single read file sample.fq.gz.

1. Download the mOTUs.pl script for your platform.

2. Expand the tarball (this step, naturally, only needs to be performed once):

tar czf mOTUs.Linux64bits.tar.gz

3. This will expand to a single file, mOTUs.pl, which is a perl script which can now be applied to your sample:

perl mOTUs.pl sample.fq.gz

Note: The first time you run the mOTUs.pl script, it will create a directory, motus_data, to where it will expand all the needed dependencies.

The mOTU profiles are saved in the following folder:

motu.profiles/mOTU.v1.padded/

AND, easy to use links are saved in the RESULTS folder

RESULTS/annotated.species.clusters.abundances.tab.gz
RESULTS/species.clusters.abundances.tab.gz

Inside the motu.profiles folder the file ending with .insert.mm.dist.among.unique.scaled.mOTU.gz are number of inserts matching the different COGs.

From these, .tab files are generated. These files contain the number of inserts mapping to annotated species cluster, and species clusters. The 'fractions' files has these abundances as fractions of the total sum. The fractions files are the files available (as symbolic links) inside the RESULTS folder.

The taxonomic profiles are saved in the following folder:

taxonomic.profiles/Ref10.v1.padded/

AND, an easy to use link is saved in the RESULTS folder

RESULTS/NCBI.species.abundances.tab.gz

Paired-end data

If your data is paired-end, consisting of files sample.1.fq.gz and sample.2.fq.gz, you can simply run:

perl motus.pl sample.1.fq.gz sample.2.fq.gz

The results will have exactly the same form as the single-end example above.

Multiple sample mode

To process multiple samples, organize them in the following way:

  1. One sample per directory

  2. Create a text file with the directory names (one directory per line)

  3. Pass the name of that text file to the motus.pl script:

Now run (where <SAMPLE_FILE> should be replaced with the name of the file you created in step 2 above):

perl mOTUs.pl --sample-file <SAMPLE_FILE>

Note: This is the same organization is used by the MOCAT pipeline.

Options

The script accepts the following command line arguments (you can also access this information by running the script without any arguments: perl motus.pl):

processors
This should be an integer and defines the number of processors that the script will use.
length-cutoff
The minimum size per read (after quality-based trimming)
fastq-format
The format of the input files. Must be one of 'auto' (the default), 'sanger', or 'illumina'. Note that new Illumina machines actually use the 'sanger' format. The auto-detection should generally work well.
output-directory
Where to place the final results file (by default it uses a directory named RESULTS).
identity-cutoff
Minimum percentage identity in alignment (default: 97)
quality-cutoff
Basepair quality cutoff (default: 20)