Documentation, examples, tutorials and more



Smash::Utils::Taxonomy - NCBI and RDP taxonomy related utility functions


        use Smash::Utils::Taxonomy qw(:all);

        # Parse the NCBI tree from the NCBI taxonomy dump files

        print $NCBITree->root->newick_name;

        # Parse the RDP tree distributed with RDP classifier
        # Only works when rdp_classifier is installed

        print $BergeyTree->root->newick_name;


Smash::Utils::Taxonomy provides several useful functions that are related to NCBI and RDP taxonomic trees. When init() is called, it populates two variables of type Smash::Utils::Tree. These are $NCBITree and $BergeyTree for NCBI and RDP trees, respectively. These tree objects can be manipulated or queried using all the methods from Smash::Utils::Tree as well as special methods implemented here.

Functions accessing the local tree objects


Initializes $NCBITree and/or $BergeyTree objects based on $type. When $type is "RDP", it parses the RDP/Bergey tree from rdp_classifier. When $type is "NCBI", it parses the NCBI tree from the NCBI taxonomy dump files.


Updates the NCBI taxonomy dump files by retrieving the latest from NCBI website.


Returns the NCBI taxonomy id corresponding to the relevant taxon from RDP tree.


Returns the RDP taxonomy id corresponding to the relevant taxon from NCBI tree.


Same as www_get_taxonomy_for_id, only gets the information from local NCBI taxonomy dump files. Called as:


Same as get_taxonomy_for_id, but prepends the ordinal rank followed by underscore so that the hash can be sorted using the ordinal rank.

get_ncbi_taxonomic_rank($tax_id, $rank)

returns the NCBI taxonomy id of the ancestor of $tax_id at $rank. For example, get_ncbi_taxonomic_rank(435590, 'genus') returns 816 (435590 is for "Bacteroides vulgatus" and 816 is for "Bacteroides"). If you want the name, then use $NCBITree->nodes->{816}->name to get "Bacteroides" back.

Functions that query NCBI Taxonomy website directly

Smash::Utils::Taxonomy provides several useful functions that are related to NCBI taxonomy ids. These functions return a hash with keys are NCBI ranks and values as the rank values. E.g.,

        %taxonomy = (   
                        tax_id  => 435590,
                        name    => "Bacteroides vulgatus ATCC 8482", 
                        species => "Bacteroides vulgatus",
                        genus   => "Bacteroides",
                        family  => "Bacteroidaceae",
                        order   => "Bacteroidales",
                        class   => "Bacteroidia",
                        phylum  => "Bacteroidetes",
                   superkingdom => "Bacteria");

These functions are NOT object oriented, so you would call them as:

        my %taxonomy = www_get_taxonomy_for_id(435590);

Performs a search using "complete name" mode on NCBI taxonomy website and returns the taxonomy if found.


Performs a search using "token set" mode on NCBI taxonomy website and returns the taxonomy if found.


Gets the taxonomy for given taxonomy id from NCBI taxonomy website.
