Table of Contents
Exercises: utilities
File Format Conversion
We are going to work with this file example proposed in the ReadAl utility: p53_cdnaln.phylip Lets have a look to it :
15 933 Homo_sapie GAGGAGCAGG AAACATTTTC AGACCTATGG AAACTAGATG ATATGCTGGA Green_Monk GAGGAGCAGG AAACATTTTC AGACCTATGG AAACTAGATG ATATGCTCGA Tupaia_bel GAGGAGCAGG AAACATTTTC AGACCTATGG AAACTAGATG ATATGCTGGA Mus_muscul GAGGAGCAGG AGACATTTTC AGGCTTATGG AAACTAGACG ATTTGCTGGA Rat GAGGATCAGG AGACATTTTC ATGCTTATGG AAACTTGAAG ATTTCCTGGA O_cuniculu GAGGAGCAGG AGACGTTTTC AGACCTGTGG AAACTGGATG ATCTGCTGGA Canis_fami GAGGAGCAGG AGACATTTTC AGAATTGTGG AACCTGGATG AGCTGCTCAG Felis_catu CAGGAGCAGG AGACATTTTC GGAATTGTGG AACCTAAATG AGCCGCTCGA O_aries GAAGAACAGG AGACATTTTC CGACTTGTGG AACCTAGATG ACCTCCCGGA Sus_scrofa GAGGAGCAGG AGACATTTTC AGACTTGTGG AAACTGAACG ATCTGCTGCC Chicken GCGGAGACTG AGGTCTTCAT GGACCTCTGG AGCATGCAAC AGCCCCTCGA X_laevis ATGGAACAGG AGACATTCGA GGATCTGTGG AGTCTGCAGA CTACGTGTAA S.irideus GCTGATCAGG AGTCTTTCGA GGACCTGTGG AAAATGAACC TGGCAGTTGA Danio_reri GCGCAAAGCC AAGAGTTCGC GGAGCTCTGG GAGAAGAATT TGATTCAGGG Ictalurus_ GAGGGAAGCC AGGAGTTTGC AGAGCTCTGG CTACGGAACC TCGTTCGTGA [...... ......] CGCCATAAAA AACTCATGTT CAAGGGGCCT GAC CGCCATAAAA AATTCATGTT CAAGGGGCCT GAC CGCCATAGAA AACTAATGTT CAAAGGACCT GAC CGCCATAAAA AAACAATGGT CAAGGGGCCT GAC CGCCATAAAA AACCAATGAT CAAGGGGCCT GAC CGCCATAAAA AACCAATGTT CAAGGGGCCT GAC CGCCATAAAA AACTGATGTT CAAGGGGCTT GAC CGCCATAAAA AGCCAATGTT GAAGGGGCTC GAC TGCCATAAAA AACCAATGCT CAAGGGGCCT GAC CGCCATAAAA AACCGATGTT CAAGGGGCCT GAC TGCGGGAAGA AACTGCTGCA AAAAGGCTCG GAC AAAGGAAAGA AGCTGCTGGT TAAACAGCCC GAC AAAAGGAAGA AACTACTGGT GAAGAAGAGC GAC CAGGGAAAGA AGCTGATGGT GAAGAGAAGC GAC CGAGGGAAGA AACGACTGGT GAAGAAGTGC GAC
This file seems to be in standard phylip interleaved format, also sometimes those files are slightly differently formated, instead of having 5 columns, they can have 6, or in stead of having space separators at the beginning of each line, they can have tabulators… ReadAl is designed to be flexible with all this differences. Try to convert to Fasta those files that are different versions of the original example:
- p53_cdnaln_6col.phylip with 6 columns in stead of 5.
- p53_cdnaln_t.phylip with a tabulator at the first line, between the number of species and the length of the alignment.
- p53_cdnaln_t_t.phylip with a tabulator at each beginning of line.
You can try them with Phylemon's ReadAl or on other web servers' tools:
- …
Be aware of the length of the output returned by those programs (if they return something).
Exercises
- Convert the example file to fasta
- And try to translate those sequences into protein sequences.
Alignment Utilities
The 2 first utilities can be considered as a package of programs to make your work easier when working with alignments.
ConcatenAl
This tool is designed to concatenate 2 or more alignment files. If you are used to work with interleaved alignments and have already tried to do it by hand you will certainly understand its utility.
Exercises
- Go to the utility's page and fill it up with the available example.
- Have a look to the input file, how many files do we have? How many different species do we have? Run the example.
- Have a look to the first output file. This is our concatenate alignment. Are you able to identify the different input alignments?
This is an easy exercise however there is an important tip to remember: verify the number of species and alignments ConcatenAl detects. To doing it just expand the log file and compare the species detected to the expected ones
CDS-ProtAl
Align Codon Sequences based on Protein Template, this utility is really useful. Lets try to answer to those few questions:
- Translate sequence
- Display sequence in a codon based view (insert a space after each group of 3 nucleotides)
- Imagine the different steps you should complete to achieve the same result.