Phylogenetic annotation of samples

Metagenomic samples can be phylogenetically annotated in several different ways. We would focus here on using DNA alignments to reference sequence database to assign phylogeny to metagenomic reads.

1. Phylogenetic annotation using reference genome mapping

Metagenomic reads can be phylogenetically classified using BLASTN homology search against a database of reference sequences. We have estimated that 85% sequence identity is a good cut-off for accurately identifying the genus of a read. For example, if a read maps to Bacteroides fragilis with >85% identity, it most likely belongs to the genus Bacteroides, although it is hard to say if the read indeed comes from the species Bacteroides fragilis. If you specified --enable-refgenome-db to the configure script when you installed SMASH, then your installation already contains a reference genome sequence database that contains 1509 microbial genomes (as of 04.07.2010). We call this dataset reference_genomes.20100407. The complete taxonomical information of each sequence in this database is provided in an SQL database that is also installed in your system when you specified --enable-refgenome-db. With these two files (the sequence database and the SQL database), you are ready to perform phylogenetic assignment of metagenomic reads. The first step is to run BLAST of reads against the reference genome database.

For the rest of this section, let us use MC20.MG10 as the example metagenome. Please change the value accordingly when you analyze your own samples.

Note:: SMASH supports the use of NCBI BLAST and WU-BLAST for the homology search steps and will process the outputs according to the flavor of BLAST used.

1.1. Easy option: using `runBlast.pl`

You can use this option if you have:

configured SMASH with --enable-refgenome-db

installed NCBI BLAST using --enable-ncbi-blast (or) WU-BLAST is installed in your system

If you satisfy these requirements, then you can run BLAST using the runBlast.pl wrapper script that comes with SMASH. You can choose the flavor of BLASTN you want to run (WU or NCBI). Here's how you can do this:

        runBlast.pl --flavor=NCBI --blast=blastn \
            --database=reference_genomes.20100704 --makedb --query=MC20.MG10 \
            --subjects=50 --evalue=0.1

        runBlast.pl --flavor=WU --blast=blastn \
            --database=reference_genomes.20100704 --makedb --query=MC20.MG10 \
            --subjects=50 --evalue=0.1

If you want to make it faster by using multiple threads, you can specify that using --cpus:

        runBlast.pl --flavor=NCBI --blast=blastn \
            --database=reference_genomes.20100704 --makedb --query=MC20.MG10 \
            --subjects=50 --evalue=0.1 --cpus=4

Please see runBlast.pl for more information.

Notes:

What next?

Once this is done, your metagenomic reads have been mapped to the reference genome database! If you have performed this step for multiple metagenomes that you would like to compare, you are now ready to proceed to "comparative phylogenetic analysis".

Phylogenetic annotation of samples

1. Phylogenetic annotation using reference genome mapping

1.1. Easy option: using `runBlast.pl`

What next?

About SmashCommunity

Latest news

Phylogenetic annotation of samples

1. Phylogenetic annotation using reference genome mapping

1.1. Easy option: using runBlast.pl

What next?

About SmashCommunity

Latest news

1.1. Easy option: using `runBlast.pl`