#=================================================================#
#                                                                 #
#  PAL2NAL: robust conversion of protein sequence alignments      #
#           into the corresponding codon-based DNA alignments     #
#                                                                 #
#               version 1.0        June 15, 2005                  #
#               version 1.1        October 11, 2005               #
#               version 2.1        March 28, 2006                 #
#                                                                 #
#               Mikita Suyama (suyama@embl.de)                    #
#                                                                 #
#=================================================================#


#------------------#
# What is pal2nal?
#------------------#

PAL2NAL is a program that converts a multiple sequence alignment
of proteins and the corresponding DNA (or mRNA) sequences into
a codon-based DNA alignment. The program automatically assigns
the corresponding codon sequence even if the input DNA sequence
has mismatches with the input protein sequence, or contains UTRs,
polyA tails. It can also deal with frame shifts in the input
alignment, which is suitable for the analysis of pseudogenes.
The resulting codon-based DNA alignment can further be subjected
to the calculation of synonymous (Ks) and non-synonymous (Ka)
substitution rates.


#-----------#
# Reference
#-----------#

If you use PAL2NAL, please cite the following paper:

  - Mikita Suyama, David Torrents, and Peer Bork (2006)
    PAL2NAL: robust conversion of protein sequence alignment into
    the corresponding codon alignments.
    Nucleic Acids Res. (in press in Web Server Issue 2006).


#-------#
# Files
#-------#

The distribution version should contain the following files:

    README                 - This document
    pal2nal.pl             - The script (version 8) (written in Perl)
    test.aln               - test data (protein alignment)
    test.nuc               - test data (DNA sequences)

    for_paml (directory)
      test.cnt             - control file for codeml
      test.tree            - tree file used for codeml
      test.codeml.ori      - an example of codeml output


#-------#
# Usage
#-------#

Usage:  pal2nal.pl  pep.ali  nuc.fasta  [nuc.fasta...]  [options]

    pep.ali:     protein alignment either in CLUSTAL or FASTA format

                 - works not only a pairwise alignment but also
                   the alignment with more than 2 sequences.

                 - if there are frame shifts in your alignment,
                   you have to specify those positions by numbers:
                   for example '2' in the alignment means that
                   there are only two bases (i.e. one base deletion)
                   (see 'test.aln').


    nuc.fasta:   DNA sequences
                 (single multiple fasta format file,
                  or may be separated files)

    Options:

       -h            Show help

       -output (clustal|paml|fasta)
                     Output format, default = clustal

       -blockonly    Show only user specified blocks
                     '#' under CLUSTAL alignment (see example)

       -nogap        remove columns with gaps and inframe stop codons

       -nomismatch   remove mismatched codons (mismatch between
                     pep and cDNA) from the output

       -codontable (universal|vmitochondria)
                     Codon table
                       - universal (default)
                       - vmitochondria -> vertebrate mitochondrial

       -html         HTML output (only for the web server)

       -nostderr     No STDERR messages (only for the web server)



    - sequence order in pep.aln and nuc.fasta should be the same.

    - IDs in pep.aln are used in the output.


Example:  pal2nal.pl  test.aln  test.nuc  -output paml  -nogap


#---------------------------------#
# How to calculate Ks, Ka values?
#---------------------------------#

To calclate Ks, Ka values, you need the codeml program, which
is included in the PAML package. You can download PAML from

  http://abacus.gene.ucl.ac.uk/software/paml.html

As an example, a control file (test.cnt) and a tree file (test.tree)
are in the "for_paml" sub-directory.

These control file and tree file are designed for the 'test' data
used in PAL2NAL.

Example:

   pal2nal.pl  test.aln  test.nuc  -output paml  -nogap  >  for_paml/test.codon

   cd for_paml

   codeml  test.cnt

     You can find the output of codeml in "test.codeml".
     Ks, Ka values are very end of the output file.
     Just for comparison, there is a sample output file, "test.codeml.ori".


#------------#
# WWW server
#------------#

http://www.bork.embl.de/pal2nal


#---------#
# Contact
#---------#

If you have any questions or comments, please email me:

  Mikita Suyama
  Biocomputing, EMBL
  Meyerhofstrasse 1,
  69012 Heidelberg,
  Germany
  tel: +49 6221 387 8231
  fax: +49 6221 387 8517
  email: suyama@embl.de

