next up previous contents
Next: Global (Needleman/Wunsch) style Up: Searching sequence databases Previous: Smith/Waterman searches: hmmsw

Searching for multiple Smith/Waterman hits: hmmfs

hmmfs (``HMM fragment search'') reports an optimal set of multiple non-overlapping Smith/Waterman style matches. hmmfs was designed to detect dispersed repeat families in large-scale genomic DNA sequence data, such as the human Alu family. It is also useful for parsing the motif structure of proteins with repeated domains, such as the immunoglobulin superfamily.

As an example, consider the sequence Artemia.fa in the demos. This sequence is a polymeric globin, containing nine globin domains. hmmsw reports only the best scoring one (domain 7). To find them all, you would run:

> hmmfs globin2.hmm Artemia.fa

The output is exactly the same as hmmsw, except that multiple hits per target sequence may be reported -- nine, in this case.

Like hmmsw, hmmfs corrects for the length of the model and the length of the target sequence. Both programs utilize a full probabilistic model of their task, with the difference that hmmfs constructs a cyclic model that permits multiple matches. Because the probabilistic models differ, the two programs will probably report different scores! In particular, notice that hmmfs reports a score for Artemia globin domain 7 which is higher than what hmmsw reports. Roughly speaking, this is because hmmfs takes into account the fact that it's seeing other nearby domains, which reinforces its idea that domain 7 is probably correct.

The -F option for fancy alignment output and the -t option for setting a score threshold also work in hmmfs. Since hmmfs is often used for detecting members of DNA dispersed repeat families, the -c option (search the complementary DNA strand too) is useful. Coordinates for hits on the complementary strand are given in the same numbering scheme as the top strand; i.e. for hits on the complementary strand, the ``from'' coordinate is bigger than the ``to'' coordinate.



next up previous contents
Next: Global (Needleman/Wunsch) style Up: Searching sequence databases Previous: Smith/Waterman searches: hmmsw



Sean Eddy
Mon Apr 17 09:54:19 CDT 1995