SEQSORT User manual

SEQSORT - PSC Sequence Analysis Suite Output Sorter

Author: Alexander J. Ropelewski

Copyright (C) 1992-1997 Pittsburgh Supercomputing Center.

The SEQSORT program was developed under the National Institutes of Health NCRR grant 1 P41 RR06009 and enhanced under NIH NCRR grant 2 P41 RR06009

SEQSORT is a utility program designed to make the output produced by the PSC's sequence analysis program suite more managable and readable. The SEQSORT program can deal with output produced by:

  • SN - Search Nucleotide Database Program
  • SP - Search Protein Database Program
  • ST - Search Translated Nucleotide Database Program
  • Maxsegs - Optimal sequence (N-best) alignment program (PSC)
  • NWgap - Optimal sequence global alignment program (PSC)
  • Profile-ss - Optimal profile (N-best) alignment program (PSC)
  • Msearch - Parallel optimal sequence/profile alignment program (PSC)

By design, these programs produce output in the order that the information was located in the database. The SEQSORT program will take that information, sort it according to score and optionally create a listing file.

USAGE

seqsort -program <name> -top <int> -in <infile> -out <outfile> -list <listfile>
-program <name>
Identifies the program that produced the output. Valid names are: maxsegs, msearch, nwgap, sn, st, sp, profiless
-top <int>
Identifies the maximum number of sequence pairs that you want retained in the sorted output. For example, if -top 10 is entered, ONLY the top ten scoring sequence pairs will be stored in the output. For some programs, the -top option can be effectively interpreted to mean the maximum number of alignments. For other programs, such as maxsegs, the output may include more than one alignment per sequence pair.
-in <infile>
Identifies the input file name
-out <outfile>
Identifies the (sorted) output file name
-list <listfile>
Identifies the (optional) listing file. The listing file is simply a file that contains

NOTES

  • For best results, use only one query sequence with the original database searching program! Otherwise, the results from seqsort may not make sense!
  • Using the -top option with some programs, the output may include more than one alignment per sequence pair.

Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing An NIH Supported Resource Center 300 S. Craig Street Pittsburgh, PA 15213. Phone: 412-268-4960, Email: biomed@psc.edu

Please send suggestions for improving this code and error reports to biomed@psc.edu.