Documentation, examples, tutorials and more

<<

NAME

Smash::Analyses::GenePredictor::GeneMark - Implementation of GeneMark gene prediction software pipeline

SYNOPSIS

        my $instance = Smash::Analyses::GenePredictor::GeneMark->new
                                (
                                        'NAME'        => $label,
                                        'OUTPUT_DIR'  => $output_dir,
                                        'PREDICTOR'   => 'GeneMark',
                                        'FASTA_FILE'  => $fasta,
                                        'SELF_TRAIN'  => 0,
                                        'VERSION'     => "2.6p"
                                );
        $instance->run();

DESCRIPTION

Smash::Analyses::GenePredictor::GeneMark implements wrapper modules for the GeneMark and MetaGeneMark gene prediction software. GeneMark and MetaGeneMark are developed by http://exon.gatech.edu/GeneMark/ at Georgia Tech. You can download your own copy of the software and the license from their website.

FUNCTIONS

Mandatory and conditionally mandatory functions

is_trainable

returns 1.

train_min

returns 200000.

get_self_trained_settings

Trains a prokaryotic gene model using the input fasta file and returns the command line option to use that model.

get_generic_settings

Chooses a heuritic model with the right translation table and the GC content of the input sequence and returns the command line option to use that model.

get_command_line

Makes a command line with the model passed.

parameter_settings

describes the rule based creation of parameter set for commandline.

create_gene_gff

Suggested functions

min_length

returns 41.

Local functions

These functions are not known outside of the scope of GeneMark.

check_license_key

Checks for the GeneMark software license key and copies it to the user's home directory. This is required for GeneMark to run properly. The license key file should be available as pkg_dir/gm_key. See "DESCRIPTION" for more details.

paramdir

Directory where the parameter files reside under pkg_dir for GeneMark.

LOCAL CLASSES

This module has the following two classes that are contained locally.

NAME

Smash::Analyses::GenePredictor::MetaGeneMark - Subclass of Analyses::GenePredictor::GeneMark implementing MetaGeneMark gene prediction software pipeline.

SYNOPSIS

        my $instance = Smash::Analyses::GenePredictor::GeneMark->new
                                (
                                        'NAME'        => $label,
                                        'OUTPUT_DIR'  => $output_dir,
                                        'PREDICTOR'   => 'MetaGeneMark',
                                        'FASTA_FILE'  => $fasta,
                                        'SELF_TRAIN'  => 0,
                                        'VERSION'     => "current"
                                );
        $instance->run();

DESCRIPTION

NAME

Smash::Analyses::GenePredictor::GeneMarkParser - Parser for GeneMark/MetaGeneMark output

SYNOPSIS

        my $parser = new Smash::Analyses::GenePredictor::GeneMarkParser(
                        FH => \*STDIN, 
                        ASSEMBLY => "MC1.MG1.AS1", 
                        GENEPRED => "MC1.MG1.AS1.GP1"
                        );
        while (my $genes = $parser->parseNextSeq()) {
                foreach my $feature (@$genes) {

                        # print the gene feature as GFF.

                        $feature->print_feature_gff(\*STDOUT);

                        # print the protein sequence as fasta

                        print ">".$feature->name."\n";
                        print Smash::Core->pretty_fasta($feature->get_property("prot_seq");

                        # print the DNA sequence as fasta

                        print ">".$feature->name."\n";
                        print Smash::Core->pretty_fasta($feature->get_property("dna_seq");
                }
        }

FUNCTIONS

parseNextSeq

parses the predicted genes for the next sequence, and returns a reference to an array of Utils::GFF::Feature's. These features have special properties, prot_seq and dna_seq, which contain the protein and DNA sequence of the predicted genes.

moveToNextSeq()

moves to the next sequence on a multifasta file prediction, or to the end on a single fasta file prediction.

getGenes()

returns an array containing Utils::GFF::Feature objects, where each feature is an actual predicted gene.

getSequences()

Parse the protein/nucleotide sequences of predicted genes. The calling function needs to make sure that the file pointer is at the right position. Otherwise this will fail.

getNextLine()

reads the next line and stores it in LAST_LINE.

<<