Table of Contents

Exercises: utilities

File Format Conversion

We are going to work with this file example proposed in the ReadAl utility: p53_cdnaln.phylip Lets have a look to it :

 15 933
Homo_sapie   GAGGAGCAGG AAACATTTTC AGACCTATGG AAACTAGATG ATATGCTGGA
Green_Monk   GAGGAGCAGG AAACATTTTC AGACCTATGG AAACTAGATG ATATGCTCGA
Tupaia_bel   GAGGAGCAGG AAACATTTTC AGACCTATGG AAACTAGATG ATATGCTGGA
Mus_muscul   GAGGAGCAGG AGACATTTTC AGGCTTATGG AAACTAGACG ATTTGCTGGA
Rat          GAGGATCAGG AGACATTTTC ATGCTTATGG AAACTTGAAG ATTTCCTGGA
O_cuniculu   GAGGAGCAGG AGACGTTTTC AGACCTGTGG AAACTGGATG ATCTGCTGGA
Canis_fami   GAGGAGCAGG AGACATTTTC AGAATTGTGG AACCTGGATG AGCTGCTCAG
Felis_catu   CAGGAGCAGG AGACATTTTC GGAATTGTGG AACCTAAATG AGCCGCTCGA
O_aries      GAAGAACAGG AGACATTTTC CGACTTGTGG AACCTAGATG ACCTCCCGGA
Sus_scrofa   GAGGAGCAGG AGACATTTTC AGACTTGTGG AAACTGAACG ATCTGCTGCC
Chicken      GCGGAGACTG AGGTCTTCAT GGACCTCTGG AGCATGCAAC AGCCCCTCGA
X_laevis     ATGGAACAGG AGACATTCGA GGATCTGTGG AGTCTGCAGA CTACGTGTAA
S.irideus    GCTGATCAGG AGTCTTTCGA GGACCTGTGG AAAATGAACC TGGCAGTTGA
Danio_reri   GCGCAAAGCC AAGAGTTCGC GGAGCTCTGG GAGAAGAATT TGATTCAGGG
Ictalurus_   GAGGGAAGCC AGGAGTTTGC AGAGCTCTGG CTACGGAACC TCGTTCGTGA
             [......
              ......]
             CGCCATAAAA AACTCATGTT CAAGGGGCCT GAC
             CGCCATAAAA AATTCATGTT CAAGGGGCCT GAC
             CGCCATAGAA AACTAATGTT CAAAGGACCT GAC
             CGCCATAAAA AAACAATGGT CAAGGGGCCT GAC
             CGCCATAAAA AACCAATGAT CAAGGGGCCT GAC
             CGCCATAAAA AACCAATGTT CAAGGGGCCT GAC
             CGCCATAAAA AACTGATGTT CAAGGGGCTT GAC
             CGCCATAAAA AGCCAATGTT GAAGGGGCTC GAC
             TGCCATAAAA AACCAATGCT CAAGGGGCCT GAC
             CGCCATAAAA AACCGATGTT CAAGGGGCCT GAC
             TGCGGGAAGA AACTGCTGCA AAAAGGCTCG GAC
             AAAGGAAAGA AGCTGCTGGT TAAACAGCCC GAC
             AAAAGGAAGA AACTACTGGT GAAGAAGAGC GAC
             CAGGGAAAGA AGCTGATGGT GAAGAGAAGC GAC
             CGAGGGAAGA AACGACTGGT GAAGAAGTGC GAC

This file seems to be in standard phylip interleaved format, also sometimes those files are slightly differently formated, instead of having 5 columns, they can have 6, or in stead of having space separators at the beginning of each line, they can have tabulators… ReadAl is designed to be flexible with all this differences. Try to convert to Fasta those files that are different versions of the original example:

You can try them with Phylemon's ReadAl or on other web servers' tools:

Be aware of the length of the output returned by those programs (if they return something).

Exercises

  1. Convert the example file to fasta
  2. And try to translate those sequences into protein sequences.

Answers

Alignment Utilities

The 2 first utilities can be considered as a package of programs to make your work easier when working with alignments.

ConcatenAl

This tool is designed to concatenate 2 or more alignment files. If you are used to work with interleaved alignments and have already tried to do it by hand you will certainly understand its utility.

Exercises

This is an easy exercise however there is an important tip to remember: verify the number of species and alignments ConcatenAl detects. To doing it just expand the log file and compare the species detected to the expected ones

Answers

CDS-ProtAl

Align Codon Sequences based on Protein Template, this utility is really useful. Lets try to answer to those few questions:

  1. Translate sequence
  2. Display sequence in a codon based view (insert a space after each group of 3 nucleotides)
  3. Imagine the different steps you should complete to achieve the same result.