NAME

SYNOPSIS hmmt [options] hmmfile seqfile DESCRIPTION hmmt attempts to learn the pattern shared by the multiple sequences in seqfile, and saves a description of the pattern in hmmfile.
seqfile may contain RNA, DNA, or amino acid sequences (don't mix them, though). It may be in any one of several different common sequence file formats, including EMBL, Genbank, and FASTA. The easiest to type in yourself is FASTA format, which consists of a line starting with > containing the name (one word) and an optional description of the sequence, followed by one or more lines of sequence.
The output file is not meant to be directly examined by the user; it is used as input to other hidden Markov modeling programs that do multiple sequence alignment [hmma] and database searching [hmmfs, hmmls, hmms, hmmsw].
hmmt works by iteratively improving a new multiple sequence alignment calculated using the model, then a new model given that alignment. A simulated annealing protocol is used to avoid bad local minima in the iterative (expectation maximization) training procedure.
"Simulated annealing" is a well-known method for avoiding obvious local minima in an optimization problem. hmmt uses a theoretically rigorous method to sample suboptimal alignments according to a "temperature" factor measured in units of the Boltzmann factor k; the higher the temperature, the more random the alignment. A temperature factor of 1.0 is equivalent to sampling alignments exactly according to their probability. The default parameters of simulated annealing work well. If you are unhappy with them, though, you can set a starting temperature with the -k option; 5 to 10 is a good choice (default is 5). You may also set a ramp factor -r ; by default, this is set to 0.95, which means the temperature will be decreased to 95% of its current value at each iteration.
Besides simulated annealing, two other training algorithms are available. -v toggles standard Viterbi approximation to Baum-Welch expectation maximization. As a training algorithm, it is fast, but prone to serious local minimum problems (it makes bad models unless you've provided a good starting hint at the alignment). -B toggles full Baum-Welch expectation maximization. Full Baum-Welch is slow and usually not quite as good as simulated annealing.
By default, the starting model is a model with uniform state transition and symbol emission probabilities, with length equal to the average length of the sequences in seqfile. A different starting model may be provided as a hint, using the -i option. A common procedure would be to build a hint HMM from an alignment of a small number of sequences, then give that model with -i to hmmt for training on a larger number of sequences. Important: simulated annealing works by initially "melting" the starting alignment, so providing a hint to simulated annealing has no effect. -i is only useful for -v or -B training, or perhaps if the initial temperature of simulated annealing is reduced (see the -k option).
Another training option is "constrained simulated annealing". If you know the structure of some of your training sequences, you can construct a structural alignment of them and keep the rest of the homologues in a separate file. Using the -a option, hmmt can combine both a known multiple alignment and a set of unaligned homologues into a single training set. The alignment will remain fixed throughout the training process, while the homologues are aligned to it. The -o option should be used to save the final alignment if you desire. OPTIONS

SEE ALSO
Overview: hmmer(l)
Individual man pages: hmma(l), hmmb(l), hmme(l), hmmfs(l), hmmls(l), hmms(l), hmmsw(l), hmm-convert(l)
User guide and tutorial: Userguide.ps
BUGS No major bugs known.
Not very tolerant of errors on the command line.
NOTES
This software and documentation is Copyright (C) 1992-1995, Sean R. Eddy. It is freely distributable under terms of the GNU General Public License. See COPYING, in the source code distribution, for more details, or contact me.
Sean Eddy Dept. of Genetics, Washington Univ. School of Medicine 660 S. Euclid Box 8232 St Louis, MO 63110 USA Phone: 1-314-362-7666 FAX : 1-314-362-2985 Email: eddy@genetics.wustl.edu