Table of Contents
Exercises: utilities
File Format Conversion
We are going to work with this file example proposed in the ReadAl utility: p53_cdnaln.phylip Lets have a look to it :
15 933
Homo_sapie GAGGAGCAGG AAACATTTTC AGACCTATGG AAACTAGATG ATATGCTGGA
Green_Monk GAGGAGCAGG AAACATTTTC AGACCTATGG AAACTAGATG ATATGCTCGA
Tupaia_bel GAGGAGCAGG AAACATTTTC AGACCTATGG AAACTAGATG ATATGCTGGA
Mus_muscul GAGGAGCAGG AGACATTTTC AGGCTTATGG AAACTAGACG ATTTGCTGGA
Rat GAGGATCAGG AGACATTTTC ATGCTTATGG AAACTTGAAG ATTTCCTGGA
O_cuniculu GAGGAGCAGG AGACGTTTTC AGACCTGTGG AAACTGGATG ATCTGCTGGA
Canis_fami GAGGAGCAGG AGACATTTTC AGAATTGTGG AACCTGGATG AGCTGCTCAG
Felis_catu CAGGAGCAGG AGACATTTTC GGAATTGTGG AACCTAAATG AGCCGCTCGA
O_aries GAAGAACAGG AGACATTTTC CGACTTGTGG AACCTAGATG ACCTCCCGGA
Sus_scrofa GAGGAGCAGG AGACATTTTC AGACTTGTGG AAACTGAACG ATCTGCTGCC
Chicken GCGGAGACTG AGGTCTTCAT GGACCTCTGG AGCATGCAAC AGCCCCTCGA
X_laevis ATGGAACAGG AGACATTCGA GGATCTGTGG AGTCTGCAGA CTACGTGTAA
S.irideus GCTGATCAGG AGTCTTTCGA GGACCTGTGG AAAATGAACC TGGCAGTTGA
Danio_reri GCGCAAAGCC AAGAGTTCGC GGAGCTCTGG GAGAAGAATT TGATTCAGGG
Ictalurus_ GAGGGAAGCC AGGAGTTTGC AGAGCTCTGG CTACGGAACC TCGTTCGTGA
[......
......]
CGCCATAAAA AACTCATGTT CAAGGGGCCT GAC
CGCCATAAAA AATTCATGTT CAAGGGGCCT GAC
CGCCATAGAA AACTAATGTT CAAAGGACCT GAC
CGCCATAAAA AAACAATGGT CAAGGGGCCT GAC
CGCCATAAAA AACCAATGAT CAAGGGGCCT GAC
CGCCATAAAA AACCAATGTT CAAGGGGCCT GAC
CGCCATAAAA AACTGATGTT CAAGGGGCTT GAC
CGCCATAAAA AGCCAATGTT GAAGGGGCTC GAC
TGCCATAAAA AACCAATGCT CAAGGGGCCT GAC
CGCCATAAAA AACCGATGTT CAAGGGGCCT GAC
TGCGGGAAGA AACTGCTGCA AAAAGGCTCG GAC
AAAGGAAAGA AGCTGCTGGT TAAACAGCCC GAC
AAAAGGAAGA AACTACTGGT GAAGAAGAGC GAC
CAGGGAAAGA AGCTGATGGT GAAGAGAAGC GAC
CGAGGGAAGA AACGACTGGT GAAGAAGTGC GAC
This file seems to be in standard phylip interleaved format, also sometimes those files are slightly differently formated, instead of having 5 columns, they can have 6, or in stead of having space separators at the beginning of each line, they can have tabulators… ReadAl is designed to be flexible with all this differences. Try to convert to Fasta those files that are different versions of the original example:
- p53_cdnaln_6col.phylip with 6 columns in stead of 5.
- p53_cdnaln_t.phylip with a tabulator at the first line, between the number of species and the length of the alignment.
- p53_cdnaln_t_t.phylip with a tabulator at each beginning of line.
You can try them with Phylemon's ReadAl or on other web servers' tools:
- …
Be aware of the length of the output returned by those programs (if they return something).
Exercises
- Convert the example file to fasta
- And try to translate those sequences into protein sequences.
Alignment Utilities
The 2 first utilities can be considered as a package of programs to make your work easier when working with alignments.
ConcatenAl
This tool is designed to concatenate 2 or more alignment files. If you are used to work with interleaved alignments and have already tried to do it by hand you will certainly understand its utility.
Exercises
- Go to the utility's page and fill it up with the available example.
- Have a look to the input file, how many files do we have? How many different species do we have? Run the example.
- Have a look to the first output file. This is our concatenate alignment. Are you able to identify the different input alignments?
This is an easy exercise however there is an important tip to remember: verify the number of species and alignments ConcatenAl detects. To doing it just expand the log file and compare the species detected to the expected ones
CDS-ProtAl
Align Codon Sequences based on Protein Template, this utility is really useful. Lets try to answer to those few questions:
- Translate sequence
- Display sequence in a codon based view (insert a space after each group of 3 nucleotides)
- Imagine the different steps you should complete to achieve the same result.
