Name

doGenePrediction.pl - Make gene predictions on a sequence assembly in Smash

Synopsis

        doGenePrediction.pl [options]

Options

--predictor: name of the gene predictor (GeneMark|MetaGene) (required)
--version: run the specified version of the program, if available
--assembly: assembly id (either --assembly or --genepred must be specified)
--genepred: gene prediction id (either --assembly or --genepred must be specified)
--fasta_file: fasta file
--output_dir: directory where the output files should be stored
--label: label for trained parameters (used with --self_train)
--self_train: train parameters using sequences (default: false)
--parallelize: parallelize gene prediction by breaking the input into smaller files
--cluster: cluster to run the parallel jobs for prediction
--pkg_dir: location where the gene predictor <program> is installed
--help: Prints this manual.

One of (--assembly) or (--genepred) must be specified.

Description

doGenePrediction.pl is a wrapper script to run gene prediction on a given metagenome assembly.

A normal execution of this script would be:

        doGenePrediction.pl --assembly=MC20.MG1.AS1 --predictor=GeneMark \
            --version=2.6r --self_train

When you parallelize this run using --parallelize, it will generate two shell scripts that should be run separately -- the predictor script and the loader script. First you run the predictor script, potentially on a cluster where each line could go to a different host and they can all be run simultaneously. When they are all done, then you run the loader script.

For example, if you ran:

        doGenePrediction.pl --assembly=MC20.MG1.AS1 --predictor=GeneMark \
            --version=2.6r --self_train --parallelize

it could generate two shell script files: MC20.MG1.AS1.pred.sh and MC20.MG1.AS1.load.sh. If you have a script qsub_line that submits each line in a file as a job to qsub, then you would run:

        qsub_lines MC20.MG1.AS1.GP1.pred.sh

and when all the jobs finish, you would run:

        qsub_lines MC20.MG1.AS1.GP1.load.sh

If you want Smash to manage it completely, you could specify the name of a cluster where these jobs should be sent to. For example, assuming you have an SGE grid where you can submit jobs to,

        doGenePrediction.pl --assembly=MC20.MG1.AS1 --predictor=GeneMark \
            --version=2.6r --self_train --parallelize --cluster=SGE

will submit the jobs to the default SGE queue for the execution host. Two jobs will be submitted: one for gene prediction, and one for loading the gene predictions. The loader job will only start after the prediction jobs finish.

Name

Synopsis

Options

Description

About SmashCommunity

Latest news