Exercises: evolutionary tests

The tools proposed in this section should be used on alignments in which we are confident. Statistical analysis that are going to be computed by most of the tools in this section are highly sensible to wrongly aligned regions.

We are going to do here some exercises on the examples proposed by Phylemon for a selection of tools for each sub-section.

Model selection

In this section Phylemon groups programs to help in selecting the model that fits our data best.

jModelTest

Running HYPHY's Model Test

In the front page of the application for running jModelTest on the Phylemon server, the user is asked for a nucleotide sequence alignment input file, and the tree relating the sequences.

Go to the “Evolutionary tests” Tab in the Phylemon page, and click on the ModelTest link in “Evolutionary tests” and “Model selection” section.

Click on the Help link to download JModelTest documentation.

Expand the example section and select example1, the different input fields will be field for the example. But try more option:
- check boxes for computation of the “Akaike Information Criterion”
- check boxes for computation of the “Hierarchical LikeliHood” (set confidence level to 0.05)

Click on 'run' to run the analysis and wait until your job name turns green.

Click on the link to the results file and save a copy under a suitable name in the location where you can easily retrieve it for later use.

Analyzing the results

Which model was selected using the hierarchical LRT scheme?
Which model was selected using the Akaike Information Criterion?
Did they both select the model? If not, why might this have occurred?
1. How do the AIC scores of the two models compare? Are they largely different? ¹⁾
2. What happened with the model selected by AIC in the hLRT (see the output file for a record of each step taken in the hLRT)? ²⁾
Which model would you use for further analysis?

Answers

Adaptation tests

CodeML

Models

Have a look to the examples available for this tools. Here examples are classified by the kind of models the user wants to compute (some models are missing here, more information is available in PAML user guide ):

Pairwise model: This model (runmode equal -2) is usually selected to compute a ML estimation of the synonymous and non-synonymous mutation rates in pairwise comparisons.
Branch models:
- Free branch model, independent ω value for each branch.
- One or more free branch, 2 or more ω values. The branch that are marked in the tree with # will be independent of the rest (background)
- One free branch, one fixed branch, background equal ω.
- Same two branches with the other one fixed
- Two free branches with the same omega

Site Models:
- M0: one ω across all lineages
- M1: neutral model, one ω across all lineages, 2 class of sites (0< =ω<1, ω=1)
- M2: selection model, 3 class of sites (ω=0, 0< =ω< =1, ω>1)
- M7: neutral model, 10 classes of sites on all lineages, all with ω< =1
- M8: selection model, 11 classes of sites on all lineages, 10 with ω< =1, one with dN/dS>1

Branch site models:
- Model A: like a site model, but marked branch is treated as foreground allowing 3 class of sites (ω=0, 0<ω<1, ω= >1), and others, as background with only 2 class of sites (ω=0, ω=1).
- Model A1: same but last class of site of the foreground is fixed at 1 (ω=0, 0<ω<1, ω=1)

Exercise

Go to the CodeML's page. And have a look to the examples proposed, that are the one explained above.

Which of this example would you use if you want to have an idea of the different evolutionary rates along you phylogeny?
Which model would be better to see different rates along each site of your alignment? Can this model be used alone to detect positive selection?
Which kind of comparison do we have to do if we want to detect positive selection on a branch that seems to have high value(s) of omega?