Documentation, examples, tutorials and more

<<

NAME

Smash::Utils::Taxonomy - NCBI and RDP taxonomy related utility functions

SYNOPSIS

        use Smash::Utils::Taxonomy qw(:all);

        # Parse the NCBI tree from the NCBI taxonomy dump files

        Smash::Utils::Taxonomy::init("NCBI");
        print $NCBITree->root->newick_name;

        # Parse the RDP tree distributed with RDP classifier
        # Only works when rdp_classifier is installed

        Smash::Utils::Taxonomy::init("RDP");
        print $BergeyTree->root->newick_name;

DESCRIPTION

Smash::Utils::Taxonomy provides several useful functions that are related to NCBI and RDP taxonomic trees. When init() is called, it populates two variables of type Smash::Utils::Tree. These are $NCBITree and $BergeyTree for NCBI and RDP trees, respectively. These tree objects can be manipulated or queried using all the methods from Smash::Utils::Tree as well as special methods implemented here.

Functions accessing the local tree objects

init($type)

Initializes $NCBITree and/or $BergeyTree objects based on $type. When $type is "RDP", it parses the RDP/Bergey tree from rdp_classifier. When $type is "NCBI", it parses the NCBI tree from the NCBI taxonomy dump files.

update_files()

Updates the NCBI taxonomy dump files by retrieving the latest from NCBI website.

bergey2ncbi($id)

Returns the NCBI taxonomy id corresponding to the relevant taxon from RDP tree.

ncbi2bergey($id)

Returns the RDP taxonomy id corresponding to the relevant taxon from NCBI tree.

get_taxonomy_for_id

Same as www_get_taxonomy_for_id, only gets the information from local NCBI taxonomy dump files. Called as:

        $NCBITree->get_taxonomy_for_id(435590);
        $BergeyTree->get_taxonomy_for_id(443);
get_ordered_taxonomy_for_id

Same as get_taxonomy_for_id, but prepends the ordinal rank followed by underscore so that the hash can be sorted using the ordinal rank.

get_ncbi_taxonomic_rank($tax_id, $rank)

returns the NCBI taxonomy id of the ancestor of $tax_id at $rank. For example, get_ncbi_taxonomic_rank(435590, 'genus') returns 816 (435590 is for "Bacteroides vulgatus" and 816 is for "Bacteroides"). If you want the name, then use $NCBITree->nodes->{816}->name to get "Bacteroides" back.

Functions that query NCBI Taxonomy website directly

Smash::Utils::Taxonomy provides several useful functions that are related to NCBI taxonomy ids. These functions return a hash with keys are NCBI ranks and values as the rank values. E.g.,

        %taxonomy = (   
                        tax_id  => 435590,
                        name    => "Bacteroides vulgatus ATCC 8482", 
                        species => "Bacteroides vulgatus",
                        genus   => "Bacteroides",
                        family  => "Bacteroidaceae",
                        order   => "Bacteroidales",
                        class   => "Bacteroidia",
                        phylum  => "Bacteroidetes",
                   superkingdom => "Bacteria");

These functions are NOT object oriented, so you would call them as:

        my %taxonomy = www_get_taxonomy_for_id(435590);
www_get_taxonomy_for_complete_name

Performs a search using "complete name" mode on NCBI taxonomy website and returns the taxonomy if found.

www_get_taxonomy_for_token_set

Performs a search using "token set" mode on NCBI taxonomy website and returns the taxonomy if found.

www_get_taxonomy_for_id

Gets the taxonomy for given taxonomy id from NCBI taxonomy website.

<<