SAPS usage
Usage: saps[/out=out.dat] [-dtv] [-c] [-s species] [-H] [-a XY...]
[-b libfname] [-l lstfname] seqfname(s)
/out=out.dat - send output to the file "out.dat" instead of the
terminal. Note that there is no space between saps
and /out ( SAPS/out=out.dat)
-d - documented output
-t - terse output
-v - verbose output.
-c - append computer-readable summary output to file
`saps.table'.
-s species - use species.q for quantile comparisons.
-H - count H as positive charge.
-a XY... - analyze spacings of amino acids X, Y, ....
-b libfname - read protein sequence data from library file
libfname.
-l lstfname - read protein sequence data from files specified
in the list file LST_lstfname.
seqfname(s) - read protein sequence data from individual file(s).
Information consists of individual protein sequences of lengths
not exceeding 10,000 residues. Input is supplied by the
arguments seqfname(s), -l lstfname, and -b libfname
OPTIONS AND PARAMETERS
seqfname(s)
Individual sequences are supplied via the files seqfname(s) in IND format: the first line of the file is a descriptor line which will be printed; the following lines (if any) are annotation. The first line of the sequence is immediately preceded by a line containing the delimiter `@@', and subsequent symbols are A-Z (one letter-code symbols) as part of the sequence or irrelevant characters (like numbers and blanks); non-standard symbols for ambiguous or missing residues are ignored. Lines should not exceed 512 characters. Example (reformatted SWISS-PROT entry for Drosophila cut protein):
>SW;HMCU_DROME: HOMEOBOX PROTEIN CUT (GENE NAME: CT). (any number of comment lines that do not contain `@@') @@ 1 MQPTLPQAAG TADMDLTAVQ SINDWFFKKE QIYLLAQFWQ QRATLAEKEV (sequence continued) 2161 AVTTAAATAA AGWN
-l lstfname
There are two other possible inputs to SAPS that can be used alternatively or in conjunction with sequence file input as described above. If the -l lstfname command line is specified, input is taken from files in IND format, the names of which are specified in the file LST_lstfname. A list file must be named with a prefix LST_ and arbitrary suffix lstfname. It must have two lines of comments indicated by a # symbol in the first position followed by lines giving the names of input files in IND format, one per line. Example:
#'HELIX.*LOOP.*HELIX' proteins: # ARLC_MAIZE ARRS_MAIZE ASH1_RAT
-b libfname
Library files (invoked by the command line flag -b libfname) contain one or more sequence files assembled in LIB format: one-line descriptors beginning with > in the first position followed by the sequence in free format (non-one-letter-code symbols again being ignored; up to 10,000 characters per line). Example:
>SW;ARLC_MAIZE: ANTHOCYANIN REGULATORY LC PROTEIN (GENE NAME: LC). MAVSASRVQQAEELLQRPAERQLMRSQLAAAARSINWSYALFWSISDTQP(sequence continued) >SW;ARRS_MAIZE: ANTHOCYANIN REGULATORY R-S PROTEIN (GENE NAME: R-S). MAVSASRVQQAEELLQRPAERQLMRSQLAAAARSINWSYALFWSISDTQP(sequence continued) >SW;ASH1_RAT: ACHAETE-SCUTE HOMOLOGUE 1 (GENE NAME: MASH-1). MESSGKMESGAGQQPQPPQPFLPPAACFFATAAAAAAAAAAAAAQSAQQQ(sequence continued)
Running SAPS on each of the above three sequences could thus be done in any of the following ways (assuming that the list file under B is named LST_hlh and that the library file under C is named LIBhlh):
a)saps ARLC_MAIZE ARRS_MAIZE ASH1_RAT > OUTPUT b)saps -b LIBhlh > OUTPUT c)saps -l hlh > OUTPUT
Output is directed to standard output. To run SAPS on the sequence file HMCU_DROME, for example (see above), one might type the command `saps HMCU_DROME | more' or `saps HMCU_DROME > OUT- PUT'. The output format can be modified by the flags -d, -t, or -v, and -c: The output will come with documentation that annotates each part of the program; this flag should be set when SAPS is used for the first time; it provides helpful explanations with respect to the statistics being used and the layout of the output. This flag specifies terse output that is limited to the analysis of the charge distribution and of high scoring segments. This flag specifies verbose output with more detail than normally required. This flag is used in conjunction with the analysis of sets of proteins (specified typically with the -b libfname or -l lstfname options); if specified, the file `saps.table' is appended with computer-readable lines describing the input files and their significant features.
The residue composition of the input protein may be evaluated relative to standard sets of proteins grouped by species, size class, subcellular location, function, or other criteria. Specifically, the composition of the input protein is compared with the quantile table of residue usage for the the user-specified standard set. Extremal usages which fall in the tails of the reference distribution are indicated for individual amino acids, charged and hydrophobic residues. The reference set is selected with the command line flag 's species'. The following options for 'species' are currently supported: bsubt (Bacillus subtilis), drosophilia (Drosophilia melanogaster), ecoli (Escherichia coli), hmr (combined human, mouse, and rat proteins), human (human), mammalian (mammalian), swp17s (default; random sample of proteins from SWISS_PROT, release 17.0), and yeast (Saccharomyces cerevisiae). For each set, only proteins of lengths of at least 200 residues were included; redundant entries were culled (for lists of SWISS_PROT file names composing each set and the quantile tables see directory BIO$ROOT1:[SAPS.INCLUDE].
By default, SAPS treats only lysine (K) and arginine (R) as positively charged residues. If the command line flag `-H' is set, then histidine (H) is also treated as positively charged in all parts of the program involving the charge alphabet.
Clusters of particular amino acid types may be evaluated by means of the same tests that are used to detect clustering of charged residues (binomial model and scoring statistics). These tests are invoked by setting the `-a' flag; for example, to test (separately) for clusters of alanine (A) and serine (S), set `-a AS'. The binomial test is also programmed for certain combinations of amino acids: AG (flag `-a a'), PEST (flag `-a p'), QP (flag `-a q'), ST (flag `-a s').