- 18 lessons
- 0 quizzes
- 4 week duration
Week 1
Week 2
Week 3
Week 4
Project Discussion
Multiple Sequence Alignment
A phylogenetic tree is an estimate of the relationships among taxa (or sequences) and their
hypothetical common ancestors. Today most phylogenetic trees are built from molecular data: DNA
or protein sequences. Originally, the purpose of most molecular phylogenetic trees was to estimate
the relationships among the species represented by those sequences, but today the purposes have
expanded to include understanding the relationships among the sequences themselves without
regard to the host species.
MEGA5 is an integrated program that carries out all four steps in a single environment, with a
single user interface eliminating the need for interconverting file formats. At the same time,
MEGA5 is sufficiently flexible to permit using other programs for particular steps if that is desired.
MEGA5 is, thus, particularly well suited for those who are less familiar with estimating
phylogenetic trees.
Step 1: Retrieval the Sequences
Ironically, the first step is the most intellectually demanding, but it often receives the least attention.
If not done well, the tree will be invalid or impossible to interpret or both. If done wisely, the
remaining steps are easy, essentially mechanical, operations that will result in a robust meaningful
tree.
When you start MEGA5, it opens the main MEGA5 window. From the Align menu choose Do
Blast Search. MEGA5 opens its own browser window to show a nucleotide BLAST page from
National Center for Biotechnology Information (NCBI). There is a set of five tabs near the top of
that page (blastn, blastp, blastx, tblastn, and tblastx). By default the blastn (Standard Nucleotide
BLAST) tab is selected. If your sequence is that of a protein click the blastp tab to show the
Standard Protein BLAST page.
Which BLAST Algorithm to Use?
The bottom section of the page allows you to choose the particular variant of BLAST that best suits
your purposes. For nucleotides, the choices are megablast for highly similar sequences,
discontiguous megablast for more dissimilar sequences, or blastn for somewhat similar sequences.
The default is blastn, but if you are only interested in identifying closely related homologs tick
megablast. This is the first choice that really demands some thought.
Aligning the Sequences
If the Alignment Explorer window is not already open, in MEGA5’s main window choose Open a
File/Session from the File menu. Choose the MEGA5 alignment file (.mas) or the sequence file
(.fasta) that you saved in Step 1. In the resulting dialog choose Align.
If your sequence is DNA you will see two tabs: DNA Sequences and Translated Protein
Sequences. The DNA sequences tab is chosen by default. Click the Translated Protein Sequences
tab to see the corresponding protein sequence
MEGA5 cannot use the .mas file directly to estimate a phylogenetic tree, so you must also choose
Export Alignment from the Data menu and export the file in MEGA5 format where it will get
a .meg extension. You will be asked to input a title for the data.
Estimate the Tree
In MEGA5’s main window choose Open a File/Session from the File menu and open the .meg file
that you saved in Step 2
ML uses a variety of substitution models to correct for multiple changes at the same site during the
evolutionary history of the sequences. The number of models and their variants can be absolutely
bewildering, but MEGA5 provides a feature that chooses the best model for you.
From the Phylogeny menu choose Construct/Test Maximum Likelihood Tree.
Present the Tree
A drawing of a phylogenetic tree conveys a lot of information, both explicit and implicit. A
phylogenetic tree consists of external nodes (the tips) that represent the actual sequences that exist
today, internal nodes that represent hypothetical ancestors, and branches that connect nodes to each
other. The lengths of the branches represent the amount of change that is estimated to have occurred
between a pair of nodes