Documentation, examples, tutorials and more

<<

NAME

Smash::Analyses::Assembler::Celera - Implementation of Celera assembly software pipeline

SYNOPSIS

DESCRIPTION

This module performs assemblies using Celera assembler.

Default options

Here are the default options used by Smash for a metagenomic assembly:

        utgErrorRate         = 0.12
        ovlErrorRate         = 0.14
        cnsErrorRate         = 0.14
        cgwErrorRate         = 0.14
        merSize              = 14
        overlapper           = ovl
        doFragmentCorrection = 0
        doExtendClearRanges  = 1
        utgBubblePopping     = 0
        doOverlapTrimming    = 1
        unitigger            = bog
        merOverlapperSeedBatchSize = 100000
        utgGenomeSize        = $this->genome_size

        # The following numbers are optimized for a 1GB memory limit running on a laptop.
        # This can only run very small assemblies.
        # Must override for larger assemblies run on clusters

        ovlMemory                  = 1GB
        ovlCorrBatchSize           = 100000
        frgCorrBatchSize           = 100000

And here are the defaults used for a single genome assembly:

        utgErrorRate         = 0.03
        ovlErrorRate         = 0.06
        cnsErrorRate         = 0.06
        cgwErrorRate         = 0.10
        merSize              = 22
        overlapper           = mer
        doFragmentCorrection = 1
        doExtendClearRanges  = 2
        utgBubblePopping     = 1
        doOverlapTrimming    = 1
        unitigger            = bog
        merOverlapperSeedBatchSize = 100000
        utgGenomeSize        = $this->genome_size

        # The following numbers are optimized for a 1GB memory limit running on a laptop.
        # This can only run very small assemblies.
        # Must override for larger assemblies run on clusters

        ovlMemory                  = 1GB
        ovlCorrBatchSize           = 100000
        frgCorrBatchSize           = 100000

As mentioned inside the options, ovlMemory, ovlCorrBatchSize and frgCorrBatchSize must be changed for running large assemblies. We recommend ovlMemory=8GB if you have such a server.

Machine specific options

Overriding default options

You can override any of the options mentioned earlier, or send other options understood by Celera assembler using the extra_options parameter as follows:

        doAssembly.pl ... --assembler=Celera \
                --extra_options="ovlMemory=8GB merSize=19 overlapper=mer"

FUNCTIONS

assemble()

Complicated. To do!

post_assembly()

This is the global post-assembly step at the end of the assembly after all the assemblies are done. It generates:

        1. Contig fasta file
        2. Contig-to-read mapping file in GFF format
        3. Scaffold-to-contig mapping file in GFF format

<<