Phylogeny
Phylogeny programs in Phylemon compute Distances, Maximum Parsimony and Statistics Methods. The last includes Maximum Likelihood and Bayesian approaches. Phylogenetic solutions in Phylemon emphasizes on Distances and Statistics methods. Maximum Parsimony in Phylemon uses the very basic programs of Phylip for protein (ProtPars) and DNA sequence data (DnaPars).
Distance Methods
Distances in PHYLIP
Distance matrices for DNA and protein sequence data are computed by DnaDist and ProtDist programs, respectively. The output consist in a single outfile with all the pairwise distances between sequences. The methods are limited by the few number of evolutionary models they have to correct for multiple hits.
Distances tree reconstruction in PHYLIP uses Neighbor and Fitch programs. Neighbor build trees by means of cluster analysis methods such as UPGMA and Neighbor-Joining (NJ) algorithms. Fitch program computes Minimum Evolution and Least Square methods. The output in all the cases consist in two files the outfile and the outtree. The last file contain the tree in a newick format that can be read by tree viewer programs such as ETE.The outfile depicts a rough tree able to read and understand the solution but it is not useful to interact with other programs.
You can get more information here Phylip Documentation
NJ trees using ML distances
Maximum Likelihood (ML) Methods
Phylemon runs Maximum Likelihood (ML) methods of phylogenetic reconstruction by means of: DnaML and ProML programs in PHYLIP, TREE-PUZZLE and PhyML. We encourage the use of PhyML to obtain fast ML tree solutions for DNA and protein sequence data. Alternatively, TREE-PUZZLE can be used to test for alternative topologies (see topology testing below).Since Phylemon is a good tool for learning porpoise we added ML programs of PHYLIP, the first tools for ML reconstruction of sequence data developed by Joel Felsenstein.
ML trees with PhyML
PhyML 3.0 find ML tree for DNA or amino-Acids sequence data. The input sequence format can be interleaved (default) or sequential (see ReadAl). PhyML has a large number of substitution models to correct for non-observed number of changes. For DNA sequences, the default choice is HKY85 and there are another 6 alternative models K80, JC69, F81, F84, TN93 and GTR. For amino-acid sequences, the default choice is JTT, and others 9 models are available: Dayhoff (PAM), mtREV, WAG, DCMut, RtREV, CpREV, VT, Blosum62 and MtMam.
Parameters such as the transition/transversion ratio (for DNA sequences), the proportion of invariable sites (P), and the Gamma distribution parameter can be jointly optimized to fit the observed data ( the sequences) at the highest probability. The number of substitution rate categories is 4 by default. The shape of a gamma distribution is defined by the alpha parameter. Starting unrooted trees(s) (with branch lenghts) in newick format can be used to approximate the tree solution. By default PhyML uses a BIONJ distance-based tree to begin with the tree topology search process.
Finally, users can optimize topology and all the parameters, or can optimize the branch lengths and rate parameters by fixing the topology. If you choose for no optimization, PHYML just returns the likelihood of the starting tree(s). PHYML can solve ML bootstrap solutions very fast and it is very common to run 1,000 pseudoreplicates with a medium size phylogenetic problem (approx. 1,000 characters by 15 species). The waiting time probably long for a day. However a very interesting alternative is to run the aLRT solution to search for other kind of pseudoparametic support (aLRT values higher than 30 correlates to bootstrap values higher than 95%).
You can get more information here: PhyML 3.0 web page
Example on how correlate bootstrap and aLRT values:
TREE-PUZZLE and ML tests of topologies
TreePuzzle searches for the best ML tree solution using the quartet-puzzling algorithm. TREEPUZZLE also computes pairwise maximum likelihood distances that can be followed by a Neighbor Joining tree in Phylemon to obtain a NJ distance reconstruction using ML computation of genetic differences.In addition, TREE-PUZZLE computes the likelihood mapping, a method to investigate the phylogenetic inertia of the data without computing an overall tree. We recommend to use PhyML to search for the best tree and to use TreePuzzle to test for the best topology against alternatives. The example 2 runs this kind of analysis. You need to define two alternative topology (at least) and evaluate the best tree according to a pre-defined model (of course it is the best for your data). TreePuzzle run one and two-sided Kishino-Hasegawa test, Shimodaira-Hasegawa test, Expected Likelihood Weights. The outfile point out the best tree and the statistical differences (if any) with the alternative trees.
You have more information here: TREE-PUZZLE web page
ML trees in PHYLIP
PROML and DNAML are ML methods of tree reconstruction using DNA or protein sequences in the Phylip package. Both programs make tree inference by using all the parameters defined by the user. That means that the program can not search for the best combination of parameters (for instance alpha, invariant proportion and rates). Since they were the first programs to build ML trees all this option were not included. We added this programs to teach about the use of the first programs computing likelihoods.
You can get more information here: PHYLIP web page
PhyML Best AIC tree
PhyML Best AIC tree is a python script allowing the reconstruction of ML trees using the best AIC-DNA or protein model over all available in PhyML. With AIC criteria are also calculated respective weight of each model
1).
Source code of this program is available here.
Bayesian Methods
Many people disagree with ML analysis because the method provides an statistical solution that explain with the best accuracy the probability of the data (aligned sequences) according to the model (topology, branches length, and all the parameters of the evolutionary model). If the model you chose to solve the tree is not true, the tree is false. An alternative solution is to maximize the probability to find the tree and all the parameters given the data. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem.
MrBayes
The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees.
Interactive mode
MrBayes allows user to run it interactively, letting user executing usual commands directly in a shell 2), otherwise, if you select the non-interactive option MrBayes will stop at the end of your block of commands.
Interactive option is useful for experimented users, and allow to adjust the number of generations to run instead of assuming that 100.000 would be enough.
Note that MrBayes have extensive help available from inside this shell, you can start by typing Help
to get start:
Make your MrBayes commands block
in the case you have not built a MrBayes command block (you are only providing to MrBayes an alignment), you could be interested in checking this option. Once checked you will see a consequent increase of the form size, those new options should appear:
By choosing different values for this set of parameters we propose, user will be able to circumvent the usual manual build of MrBayes command block and its problems (typos, missing arguments…). It is important to note that this option is independent of the Interactive option, user can build his block using the form and go deeper in the analysis in a second step using the interactive option.
Warning: building a commands block will not merge with any other information in file uploaded, if another commands block is found Phylemon will replace it.
Citation
MrBayes
PHYLIP
TREE-Puzzle
NJ trees using ML distances (HYPHY)
PhyML