Name

doAssembly.pl - Smash driver script for metagenome sequence assembly

Synopsis

        doAssembly.pl --metagenome=<name> --assembler=<program> \
            --genome_size=<size> [options]

Options

--metagenome (required): name of the metagenome
--assembler (required): sequence assembly program (Arachne|Celera|Forge)
--genome_size (required): estimated genome size for assembly
--cluster: name of cluster (required for Forge)
--single_genome: specify a single genome assembly (detault: false)
--superiterative: do superiterative assembly, applicable to Arachne for now (default: false)
--finish: finish post-processing of an assembly that was started independently (default: false)
--assembly: assembly id to finish (used only with --finish>
--pkg_dir: location where the assembler <program> is installed
--extra_options: extra options to pass to the assembler. The assembler object must know how to handle this option. See "Extra Options" for more details.
--help: Prints this manual.

Description

doAssembly.pl is a wrapper script to perform sequence assembly on a given metagenome. This script looks up the data and data type for the given metagenome, set things up for the given assembler and runs the assembly.

Currently it supports two assemblers: Arachne and Forge. Where this script finishes and what should be done next depends on which assembler is used.

Arachne: You can start assembling a metagenome using Arachne as follows:; The script finishes the assembly and copies all the necessary files from its temporary workspace to Smash data repository. (See User Manual for more information on these locations.) The next step to run would be loadAssembly.pl like so:; Please check Smash::Analyses::Assembler::Arachne for information on Smash's Arachne assembly process.
Celera: Celera assembler is more an assembly pipeline with several steps. Therefore it is normally run on a Sun Grid Engine. To do so, you must have configured a cluster with Smash::Utils::Cluster. In this case, you can run the assembly like so:; Make sure to capture the output somewhere, since it contains useful information. The output looks something like this:; Smash uses a two step assembly procedure for Celera assembly of metagenomes. The first step is the normal assembly, and the second step is called "repeat toggling". Celera assembler lets you run both steps automatically. However, to make things work, Smash uses slightly different parameters in each step, which Celera assembler does not support. Therefore these two steps have to be managed by Smash externally. In this example, Celera assembler pipeline will run as job 264970, but it will submit more jobs for all its steps. All these jobnames will contain "MC20.MG2.AS3". After all these steps are done, the only job with "MC20.MG2.AS3" in its name is job 264971. At this stage, check the first qc file in bullet (2) to see if everything looks good. The first few lines should look like:; If it looks good, release the hold on the second job by typing:; This will start step 2 of the assembly, and will submit more jobs to the cluster. When there is no more job in the cluster with "MC20.MG2.AS3", check the qc file in bullet (5). If it looks good, then you can finish this assembly by running; You can now load the assembly to Smash using:; Please check Smash::Analyses::Assembler::Celera for information on Smash's Arachne assembly process.
Forge: Since Forge uses MPI, this script does not wait until the MPI processes finish. Instead, it starts the MPI threads and quits. The user must check if the processes are done and then run a finishing step before loading the assembly into database, like so:; This finish script must be run with the exact same options with an extra --finish for this to work right.

Extra Options

Smash has in-built default options for each of the assemblers. You can override these options using the --extra_options parameter. Currently Arachne and Celera understand this option. Please check Smash::Analyses::Assembler::Arachne for default parameters for Arachne and Smash::Analyses::Assembler::Celera for default parameters for Celera.

Name

Synopsis

Options

Description

Extra Options

About SmashCommunity

Latest news