Documentation, examples, tutorials and more

<<

Name

addRefGenomeSequences.pl - Wrapper script to parse genbank/embl files containing reference genomes and possibly annotations. It loads the information into Smash refgenome database.

Synopsis

        addRefGenomeSequences.pl [options]

Options

--input (required)

genbank file to be parsed, or the directory containing multiple genbank files to be parsed.

--directory

input is a directory and not a GenBank file. The script will then look for files with an extension .gbff (or given by --extension) in that directory.

--format

format of the input file (supported: genbank, embl)

--extension

file name extension to look for under directory if --directory is given. (default: gbff)

--source (required)

source for the genbank file (e.g., NCBI, HMP, etc)

--ignore

file containing list of project ids to ignore

--prefix (required)

prefix for the output files. The following files will be created:

<prefix>.sequences.fa

all the DNA sequences in the genbank file

<prefix>.genes.fa

all the coding and non-coding genes in the genbank file

<prefix>.proteins.fa

protein sequences of the coding genes in the genbank file (translation in the genbank file is used if available, otherwise the CDS regions are translated by this script).

--help

Prints this manual.

<<