MAXSEGS

Hugh Nicholas and Alex Ropelewski


Scientific Significance:

MaxSegs is a program written by the PSC for genetic sequence analysis. MaxSegs is designed to take a experimental DNA/RNA or protein query sequence and compare it with a library of all categorized DNA/RNA or protein sequences. Searching categorized sequences with an experimental sequence allows the researcher to locate sequences that share an evolutionary, functional, or a biochemical relationship with the query sequence. (There are currently about 40,000 categorized protein sequences ranging in length from 2 to 6,000 characters. The average size of a typical protein sequence is approximately 300 residues. There are approximately 170,000 DNA/RNA sequences with lengths ranging from 100 to 200,000 bases. The length of a typical DNA sequence is about 1000 bases).


Numerical Approach and Performance:

MaxSegs uses a mathematically rigorous maximization algorithm incorporating state of the art understanding of the evolutionary processes affecting gene and protein structures. Searching with such a rigorous, state of the art algorithm is orders of magnitude more computer intensive than using less rigorous heuristic algorithms and can require several hours of single processor C-90 time even though the program executes 240 million vector instructions per second. The payoff from using these additional computing resources is that related sequences are identified that would not have been identified by the heuristic algorithms. This can save years of trial and error laboratory experiments.


Back to Contents Page