Tutorials
Generate taxonomic profiles using MGs
Simple (single sample) execution
We will analyse a sample consisting of a single read file
sample.fq.gz.
1. Download the mOTUs.pl script for your
platform.
2. Expand the tarball (this step, naturally, only needs to be performed once):
tar czf mOTUs.Linux64bits.tar.gz
3. This will expand to a single file, mOTUs.pl, which is a perl
script which can now be applied to your sample:
perl mOTUs.pl sample.fq.gz
Note: The first time you run the mOTUs.pl script,
it will create a directory, motus_data, to where it will expand all
the needed dependencies.
The mOTU profiles are saved in the following folder:
motu.profiles/mOTU.v1.padded/
AND, easy to use links are saved in the RESULTS folder
RESULTS/annotated.species.clusters.abundances.tab.gz
RESULTS/species.clusters.abundances.tab.gz
Inside the motu.profiles folder the file ending with
.insert.mm.dist.among.unique.scaled.mOTU.gz are number of inserts matching
the different COGs.
From these, .tab files are generated. These files contain the
number of inserts mapping to annotated species cluster, and species clusters.
The 'fractions' files has these abundances as fractions of the total sum. The
fractions files are the files available (as symbolic links) inside the RESULTS
folder.
The taxonomic profiles are saved in the following folder:
taxonomic.profiles/Ref10.v1.padded/
AND, an easy to use link is saved in the RESULTS folder
RESULTS/NCBI.species.abundances.tab.gz
Paired-end data
If your data is paired-end, consisting of files sample.1.fq.gz and
sample.2.fq.gz, you can simply run:
perl motus.pl sample.1.fq.gz sample.2.fq.gz
The results will have exactly the same form as the single-end example
above.
Multiple sample mode
To process multiple samples, organize them in the following way:
One sample per directory
Create a text file with the directory names (one directory per line)
Pass the name of that text file to the motus.pl script:
Now run (where <SAMPLE_FILE> should be replaced with the name of the file you created in step 2 above):
perl mOTUs.pl --sample-file <SAMPLE_FILE>
Note: This is the same organization is used by the MOCAT pipeline.
Options
The script accepts the following command line arguments (you can also access this information by running the script without any arguments: perl motus.pl):
- processors
- This should be an integer and defines the number of processors that the
script will use.
- length-cutoff
- The minimum size per read (after quality-based trimming)
- fastq-format
- The format of the input files. Must be one of 'auto' (the default),
'sanger', or 'illumina'. Note that new Illumina machines actually use the
'sanger' format. The auto-detection should generally work well.
- output-directory
- Where to place the final results file (by default it uses a directory named
RESULTS).
- identity-cutoff
- Minimum percentage identity in alignment (default: 97)
- quality-cutoff
- Basepair quality cutoff (default: 20)
Simple (single sample) execution
We will analyse a sample consisting of a single read file sample.fq.gz.
1. Download the mOTUs.pl script for your platform.
2. Expand the tarball (this step, naturally, only needs to be performed once):
3. This will expand to a single file, mOTUs.pl, which is a perl script which can now be applied to your sample:
Note: The first time you run the mOTUs.pl script, it will create a directory, motus_data, to where it will expand all the needed dependencies.
The mOTU profiles are saved in the following folder:
AND, easy to use links are saved in the RESULTS folder
RESULTS/species.clusters.abundances.tab.gz
Inside the motu.profiles folder the file ending with .insert.mm.dist.among.unique.scaled.mOTU.gz are number of inserts matching the different COGs.
From these, .tab files are generated. These files contain the number of inserts mapping to annotated species cluster, and species clusters. The 'fractions' files has these abundances as fractions of the total sum. The fractions files are the files available (as symbolic links) inside the RESULTS folder.
The taxonomic profiles are saved in the following folder:
AND, an easy to use link is saved in the RESULTS folder
Paired-end data
If your data is paired-end, consisting of files sample.1.fq.gz and sample.2.fq.gz, you can simply run:
The results will have exactly the same form as the single-end example above.
Multiple sample mode
To process multiple samples, organize them in the following way:
One sample per directory
Create a text file with the directory names (one directory per line)
Pass the name of that text file to the motus.pl script:
Now run (where <SAMPLE_FILE> should be replaced with the name of the file you created in step 2 above):
Note: This is the same organization is used by the MOCAT pipeline.
The script accepts the following command line arguments (you can also access this information by running the script without any arguments: perl motus.pl):
- processors
- This should be an integer and defines the number of processors that the script will use.
- length-cutoff
- The minimum size per read (after quality-based trimming)
- fastq-format
- The format of the input files. Must be one of 'auto' (the default), 'sanger', or 'illumina'. Note that new Illumina machines actually use the 'sanger' format. The auto-detection should generally work well.
- output-directory
- Where to place the final results file (by default it uses a directory named RESULTS).
- identity-cutoff
- Minimum percentage identity in alignment (default: 97)
- quality-cutoff
- Basepair quality cutoff (default: 20)