next up previous contents
Next: Building an HMM Up: User's Guide for Previous: Environment variables

Tutorial

I can't read documentation. I just want examples of how stuff works, just enough to get me started and doing something productive. So, here's a walk-through of some small projects. It should be sufficient to get you started on work of your own. After you have things working in their default mode, you'll want to read the other chapters and play with the fancier options.

The subdirectory Demos/ in the source distribution contains all the files I mention here. You can use these files to run all the example command lines and see what they do. You can also look at the example files for working examples of file formats, if you have trouble with your own files. I use .slx and .msf suffixes for SELEX- and MSF-formatted alignment files; .fa, .gb, .embl, .sp suffixes for FASTA/Pearson, GenBank, EMBL, and Swissprot flat text sequence database files; and .hmm for models. The programs don't actually care what suffix you use -- they detect file formats based on the file's content, not its name. If you don't like my suffixes, use your own.

I'll use analysis of globin protein sequences as examples.

Demos/ contains four files of globin sequences:

bashford.slx
structural alignment of 7 globin sequences, from Bashford, Chothia, and Lesk [4]. This is a SELEX format file, including some extra information that annotates the globin consensus secondary structure.
globins50.msf
ClustalV automated alignment of 50 randomly selected globin sequences. We will use these as training sequences in some demos.
globins50.fa
The same 50 globins, unaligned (FASTA format).
globins630.fa
Database of 630 globin sequences.
Artemia.fa
Polymeric Artemia (brine shrimp) globin.





Sean Eddy
Mon Apr 17 09:54:19 CDT 1995