Table of Contents

Exercises: phylogeny

Phylip

Distances

If we have run through the Model Comparison section we will have evaluated the fit of substitution models on a multiple sequence alignment. Now that we have an idea of which models best explain the evolution of our gene/protein throughout a given sampled, we would like to calculate the distances among sequences. This is one of the first steps in many types of phylogenetic analyses.

DnaDist

If we are working with DNA data we have to use DnaDist tool (otherwise we should use ProtDist).

Answers

PhyML

Here we want to quickly build a phylogeny from an alignement. For the purpose of this exercise we are going to work with the PhyML example: “NJ optimized”.

In a first step, load the example:

The first point to notice is that, in contrast with some other tools, PhyML allows user to load DNA or protein sequences. This feature is great but users often miss to select the correct option.

  1. As for many tools of the PHYLIP package, PhyML allows user to work with multiple alignments in order to find a consensus tree, but can also perform bootstrap by its own.
    1. check the “Use non parametric bootstrap analysis” option and have a look to the amount of alternatives PhyML proposes. Which of those option would you check? Have a look to PhyML user guide and follow references cited, in order to help your choice.
  2. PhyML allow user to define evolutionary models. And let user define if the rate, distribution, or frequencies should be estimated or fixed.
    1. Why couldn't we just estimate all of those parameters?
    2. Which combination of those options would be the best solution for your alignement?
  3. Finally the user is asked to define the degree of optimization of the output tree, the starting tree from which PhyML will run the analysis and the way to search for trees.
    1. What is a starting tree?
    2. In which case should we give to PhyML a “User-defined tree”?

Answers