[[Utilities]]

Utilities

File Format Conversion

ReadAl

ReadAl: reads and writes biological sequences (nucleic/protein) in various formats. Data files may have multiple sequences. ReadAl is particularly useful as it automatically detects many sequence formats, and inter-converts among them.

Alignment Utilities

ConcatenAl

ConcatenAl: This utility takes many alignments in one single file and concatenates them into one single alignment. The order of the alignments in the file is important, as this order wil be used to rebuild output alignment. e.g.:

 3 4
hsap       ATTA
mmus       ATTT
ptro       ATTC

 2 5
hsap       CCCCC
ptro       AAAAA

 2 3
ptro       GG
mmus       TT

will use the sequences of:

  • hsap sequence of the first and second alignment: ATTA + CCCCC
  • ptro sequence as: ATTC + AAAAA + GG
  • mmus sequence as: ATTC + AAAAA + GG

and return the concatenation of those sequences. If we selected fasta output format we would have:

>mmus/1-6
ATTT-----TT
>ptro/1-11
ATTCAAAAAGG
>hsap/1-9
ATTACCCCC--

CDS-ProtAl

CDS-ProtAl: This utility takes non-aligned sequences file as input (fasta fomat), translates them into protein (universal code), muscle with a maximum of 5hrs and 9999 iterations and the rest of the options by default, and uses this protein alignment as a template to generate the nucleotide alignment.

  • It has the option of removing any column (of 3 bases since codons are the unit of alignment) where gaps are found in any of the sequences aligned.
  • User can select genetic code among the one proposed.
  • Translate option will only translate sequences without aligning. (unchecking this option will also allow user to get translated sequences, but aligned)
  • Output files are:
    • Nucleotide alignment (outfile.out)
    • Protein alignment (outfile.out.pep) used as template for alignment
    • Map of how gapless sequences compare to the original sequence (outfile.out.map).

Source code of this program is available here.

TrimAl

  • TrimAl: is a tool for the automated removal of spurious sequences or poorly aligned regions from a multiple sequence alignment. Visit the Trimal wiki for complete documentation.

Distances between Trees

TreeDist

  • TreeDist: is a program from the PHYLIP package, it computes distances between trees. Two distances are computed, the Branch Score Distance of Kuhner and Felsenstein (1994), and the more widely known Symmetric Difference of Robinson and Foulds (1981). The Branch Score Distance uses branch lengths, and can only be calculated when the trees have lengths on all branches. The Symmetric Difference does not use branch length information, only the tree topologies. It must also be borne in mind that neither distance has any immediate statistical interpretation – we cannot say whether a larger distance is significantly larger than a smaller one.
    • Main option available in Phylemon:
      • The Branch Score Distance imagines us as having made a list of all possible partitions, the ones shown above and also all 7 other possible partitions, which correspond to branches that are not found in this tree. These are assigned branch lengths of 0. For two trees, we imagine constructing these lists, and then summing the squared differences between the branch lengths. Thus if both trees have branches {A, D | B, C, E}, the sum contains the square of the difference between the branch lengths. If one tree has the branch and the other doesn't, it contains the square of the difference between the branch length and zero (in other words, the square of that branch length). If both trees do not have a particular branch, nothing is added to the sum because the difference is then between 0 and 0.The Branch Score Distance takes this sum of squared differences and computes its square root. Note that it has some desirable properties. When small branches differ in tree topology, it is not very big. When branches are both present but differ in length, it is affected.
      • The Symmetric Difference is simply a count of how many partitions there are, among the two trees, that are on one tree and not on the other. In the example above there are two partitions, {A, C | B, D, E} and {A, D | B, C, E}, each of which is present on only one of the two trees. The Symmetric Difference between the two trees is therefore 2. When the two trees are fully resolved bifurcating trees, their symmetric distance must be an even number; it can range from 0 to twice the number of internal branches, so that for n species it can be as large as 2n-6 (for 3 species or more).

Have a look to the on-line help

Viewers

ETE

ETE is a python programming toolkit that assists in the automated manipulation, analysis and visualization of phylogenetic trees. It provides a wide range of tree handling options, methods to access the phylome phylomeDB database (containing thousands of precalculated gene phylogenies), tree annotation features, and specific modules for automatic orthology and paralogy detection.

From its version 2.1, ETE provides tools for interactive tree visualization through web applications. This is, trees are rendered as images that can be explored and manipulated from a standard web browser. Phylemon2 implements a custom web tree application based on ETE that allows users to visualize any phylogenetic tree returned by the different programs included in the suite.

ETE takes an input file in newick format, that can be extended for extra annotation using New Hampshire eXtended(NHX) format. And returns a tree in newick format also (available from your data , or through the link “outtree.nw”). Note that for compatibility reasons with other tools in phylemon, outtre will be cleaned of NHX extra information. Outtree file will correspond to the last version of your tree (last image seen).

The main options to interact with trees are reachable through four button on the top of the frame and clicking either on the background of your tree image, or on nodes:

  • Main controlers
    • Features Features: here you would be able to define the amount of information to display in the tree (usually distances and support can be added to the tree)
    • Search in Tree Search: From this field you will be able to search for nodes matching with a given name or having a specific branch-length. Search is not case sensitive and accepts Perl regular expression1), here some examples:
      1. Search for a node named “Human”: select the “name” field and type human.
      2. Search for branches longer than 1: select the “distance” field and type this Perl regular expression ^[1-9]+[0-9]?. [0-9] stands for any number between 0 and 9, the “+” is for one character or more, “?” stands for one or none occurrence of the character. So we are search for a repetition of number (“0” would not match with [1-9]+).
      3. Search for nodes with support lower than 0.5: select “support” field and type: 0\.[0-4][0-9]+. Here “\.” is because in perl regular expressions, '.' matches any character but “\n”, so we need to escape it with “\”.
    • Cancel search Cancel search: To remove marks generated by the search tool. This button is important also while searching as marks are cumulative.
    • Newick Newick format: Here you will be able to have a look to the tree in newick format.
    • Image Save image: This will bring you to the image of the tree in order to download it.
  • Interacting with leaves and internal nodes:
    • Highlight background: will highlight the background of the corresponding node and all its descendants. A random color in scales of green is generated in order to be able to differentiate overlapping usages of this tool.
    • Set as root: to set a given node as root (if tree is unrooted a warning will appear as action will be irreversible).
    • Mark branch: This will add a mark of type “ #1” to the corresponding branch, those marks are read by Codeml.
    • Copy subtree: keep a copy of the selected subtree. This Options will enable the Paste tool.
    • Cut subtree: cut the selected subtree. This Options will enable the Paste tool.
    • Paste subtree: Enabled when a subtree was cut or copied.
    • Delete subtree
    • Delete node
    • Swap branches
    • Pay me a compliment: useful to keep an eye on a given leaf, it will turned red.
  • Interacting with background:
    • Verbose layout: most informative layout, information selected to be displayed in the “Feature” box, will be displyaedin all nodes
    • Concise layout: Features will be displayed only in leaves.
    • Clean layout: sets size of each node to 0 and displays no information about extra features.

Archeopteryx

Archeopteryx tree viewer is a successor to ATV which was in the Forester package. In Phylemon it is available as applet. User can load trees in newick, NHX, nexus or phyloXML format.

Archeopteryx allow user to interact with trees in many ways, for more information about options and capabilities follow this link

Citation

TreeDist

PHYLIP (Phylogeny Inference Package) version 3.6.
Felsenstein, J.
Distributed by the author. Department of Genome Sciences, University of Washington, Seattle (USA) (2004)

TrimAl

trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T
Bioinformatics25p1972-3(2009 Aug 1)

ETE

ETE: a python Environment for Tree Exploration.
Huerta-Cepas J, Dopazo J, Gabaldón T
BMC Bioinformatics11p24(2010 Jan 13)

tools/utils.txt · Last modified: 2011/07/15 22:09 by garamonfok
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0