Exercises: phylogeny

Phylip

Distances

If we have run through the Model Comparison section we will have evaluated the fit of substitution models on a multiple sequence alignment. Now that we have an idea of which models best explain the evolution of our gene/protein throughout a given sampled, we would like to calculate the distances among sequences. This is one of the first steps in many types of phylogenetic analyses.

DnaDist

If we are working with DNA data we have to use DnaDist tool (otherwise we should use ProtDist).

go to the Phylogeny section in Phylemon and click on the link DnaDist.
fill up the form with the first example available (DNAdist Demo1)
have a look to the different options available:

Parameters:
1. Model: in this section you will be asked to choose between one of the different DNA substitution model proposed by DnaDist tool. Note that the number of models proposed is really smaller that the number of models proposed by other tools like PhyML.
2. Frequencies: here DnaDist let you define the rates of each nucleotide instead of using empirical values.
3. Transition/Transversion: choose a specific transition/transversion rate.
4. Gamma distributed rates across sites: defines a gamma distribution for the rates at which are evolving your sites.
5. Analyze multiple data: check and fill this form in case you are analyzing multiple datasets (e.g.: generated with Seqboot).
6. Distance Matrix: this option stands for the way you want to display the matrix (“Square”, “Human readable” or “Triangular”). if your purpose is to redirect the distance matrix to another Phylip tool (Neighbor or Fitch) you should let it as “Square”.
7. Weights for sites: in case you want to assign weigths to sites among your alignment, you should here upload a file with corresponding weights.

Exercise:
- Try to run DnaDist with this sample_file with the default options
- Go to the JModelTest page an run this same sample file to find the best substitution model corresponding to this alignment
- Go back to Dnadist and run again the program but this time try to fit the parameters to the output of JModelTest
  - Do you see differences in distances between the distance matrices of those two runs of DnaDist?

Answers

PhyML

Here we want to quickly build a phylogeny from an alignement. For the purpose of this exercise we are going to work with the PhyML example: “NJ optimized”.

In a first step, load the example:

The first point to notice is that, in contrast with some other tools, PhyML allows user to load DNA or protein sequences. This feature is great but users often miss to select the correct option.

As for many tools of the PHYLIP package, PhyML allows user to work with multiple alignments in order to find a consensus tree, but can also perform bootstrap by its own.
1. check the “Use non parametric bootstrap analysis” option and have a look to the amount of alternatives PhyML proposes. Which of those option would you check? Have a look to PhyML user guide and follow references cited, in order to help your choice.
PhyML allow user to define evolutionary models. And let user define if the rate, distribution, or frequencies should be estimated or fixed.
1. Why couldn't we just estimate all of those parameters?
2. Which combination of those options would be the best solution for your alignement?
Finally the user is asked to define the degree of optimization of the output tree, the starting tree from which PhyML will run the analysis and the way to search for trees.
1. What is a starting tree?
2. In which case should we give to PhyML a “User-defined tree”?