NAME

SYNOPSIS hmmb [options] hmmfile seqfile DESCRIPTION hmmb creates a hidden Markov model from the alignment in seqfile, and saves the model to hmmfile.
seqfile may contain RNA, DNA, or amino acid sequences (don't mix them, though). They may be in either GCG's MSF multiple alignment format or my SELEX alignment format. See bashford7.slx in the source distribution for an example of SELEX format.
Any of ' ', '.', or '-' are accepted as gap symbols. The output of ClustalV or ClustalW are acceptable SELEX format, if the header lines, consensus line, and coordinate lines (if any) are removed. SELEX format also has optional features for describing consensus structure, a canonical reference coordinate system, and individual structures for each sequence, but these are not described further here. See the HMMER User's Guide for more details.
The algorithm used to build the model is an unpublished maximum likelihood model construction algorithm. It is different from the procedure described by Krogh et al. It is more theoretically satisfying than the ad hoc procedure they describe, but be warned that it tends to favor assigning columns of the alignment to match states instead of insert states rather more than a human's intuition would.
The statistics of the model -- symbol emission probabilities and state transition probabilities -- are set according to a maximum likelihood criterion. We have done some research on alternative parameter estimation methods. Maximum likelihood is sensitive to biased representation in the training data set; a large subfamily of highly related sequences will skew the model's statistics. There are three hmmb options for alleviating this problem. The -d option implements "maximum discrimination", our recommended solution, which uses a rigorous method to find a model that optimally discriminates all the training sequences from random background. The -v and -w options implement two different ad hoc sequence weighting rules, providing the usual solution to the biased representation problem.
OPTIONS

SEE ALSO
Overview page: hmmer(l)
Individual man pages: hmma(l), hmme(l), hmmfs(l), hmmls(l), hmms(l), hmmsw(l), hmmt(l), hmm-convert(l)
User guide and tutorial: Userguide.ps
BUGS No major bugs known.
Not very tolerant of errors on the command line.
NOTES
This software and documentation is Copyright (C) 1992-1995, Sean R. Eddy. It is freely distributable under terms of the GNU General Public License. See COPYING, in the source code distribution, for more details, or contact me.
Sean Eddy Dept. of Genetics, Washington Univ. School of Medicine 660 S. Euclid Box 8232 St Louis, MO 63110 USA Phone: 1-314-362-7666 FAX : 1-314-362-2985 Email: eddy@genetics.wustl.edu