MAXSEGS Users Manual
Optimal Sequence Alignment Program
Authors: Alexander J. Ropelewski and Dr. Hugh B. Nicholas
Copyright (C) 1989-1998 Pittsburgh Supercomputing Center.
The MAXSEGS program was developed under the National Science Foundation cooperative agreement ASC-8500650 and enhanced under the National Institutes of Health NCRR grant 1 P41 RR06009 and 2 P41 RR06009
This document refers to PSC-MAXSEGS V3.5
- Introduction
- General Program Information
- Program Commands
- Dynamic Programming Algorithm
- Implementation on Architectures
- Sample Program Input
- References
- Appendices
Introduction
MAXSEGS is an optimal local sequence alignment program that makes use of the dynamic programming algorithm and the extensions to that algorithm developed by Michael Waterman and Mark Eggert (For detailed information on the algorithm and its extensions refer to The Journal of Molecular Biology (1987) 197, PP 723-728). The dynamic programming algorithm is a technique from mathematics for finding the minimum or maximum of a discrete function. It has found use not only in biological sequence comparisons, but also in analyzing the relatedness of bird songs, in matching geological features distorted across faults, gas chromatography, speech recognition, and in general text analysis. In general text analysis the dynamic programming method can be used to compute the minimum number of characters that have to be changed in order to convert one word to another (or one sentence to another). The implementations of the dynamic programming algorithm available in the MAXSEGS program is flexible enough to be easily adapted to these and similar uses.
MAXSEGS can be used with all types of sequence data, but is particularly useful in the analysis of biological sequences such as DNA, RNA and protein sequences. In addition the program allows the user to select many different analysis options such as defining a unique alphabet and using a different scoring matrix. The user can also limit the amount of output received from the analysis as well.
About the algorithm as related to biological sequence analysis:
A genetic sequence is a representation of the genetic information in DNA by a character string, an unbroken series of letters. A sequence has a finite length and contains only those characters that belong to a specified alphabet. Biological sequences can usually be categorized into one of three classes, DNA, RNA, or protein. In the case of DNA sequences, the set alphabet used is a four letter alphabet that consists of {A, C, G, T}. Likewise, RNA sequences also use a 4 character alphabet {A, C, G, U}. Proteins, on the other hand, use a 20 character alphabet.
One of the most useful analysis that can be done on sequences is to compare a few newly discovered sequences (called query sequences) with a large library of previously known and well characterized sequences. The results of these comparisons are called alignments. An alignment is a specific juxtaposition of the characters in the two sequences being compared. These alignments are useful in that alignments produced by the dynamic programming algorithm show the maximum amount of similarity in the juxtaposed characters of the two sequences being compared.
Two distinctly different kinds of alignments can be generated using the dynamic programming algorithm. The researcher must determine which of the two alignments will be more appropriate. A GLOBAL alignment will force all of the characters of both strings into juxtaposition with either a character from the other string or with an inserted blank. (Blanks are inserted because a global alignment requires the aligned strings to be the same length, the dynamic programming algorithm inserts the required blanks in the most favorable positions, which are not necessarily at the ends of a sequence.) Blanks inserted into a biological sequences are interpreted as the result of a physical process that caused part of the gene to be lost.) A local alignment will find the subsequences of the two character strings that are the most similar to each other. This includes all of both character strings only rarely.
Dynamic programming alignments often give the researcher valuable information, determining whether or not the query sequence is a fragment of a larger sequence, or by showing what known sequences have either similar functions or similar genetic ancestors to the query sequence. Notable successes include the identification of several oncogenes (genes that are involved in the development of cancers) as genes closely related to naturally occurring growth factor genes. These identifications save many months of expensive trial-and-error experimentation.
The Dynamic Programming Algorithm
The dynamic programming algorithm computes a solution to a problem by examining all possible solutions to that particular problem. For aligning character strings, this means that all possible juxtapositions of characters are examined. In the case of local sequence alignments, the computation can be visualized by writing one sequence along the top of a two dimensional table and writing the other sequence down the side of the same two dimensional table. Each cell in the table corresponds to a specific juxtaposition of a character from each sequence. We then fill the table by computing all possible alignments between the two sequences. The best solution is found at the location of the maximum score in the table. Given this table of all possible alignments, S is computed by the following recursive formula:
REG(i) MAX( REG(i)+C, S(i-1,j)+N+C, 0) LEG(j) = MAX( LEG(j)+C, S(i,j-1)+N+C, 0) S(i,j) = MAX( REG(i), LEG(j), S(i-1,j-1)+VALUE(A(i),B(j)), 0)
Where
S(i,0) and S(0,j) is defined as 0, when i and j >= 0.
C = the gap constant
N = the newgap (open gap) constant
A and B = the two sequences being compared
REG and LEG are used to determine if it is more beneficial to keep a old gap open, or to reintroduce a new gap.
VALUE( A(i), B(j) ) is a matrix of scores that measure the degree of similarity between pairs of characters (letters) in the alphabet from which the sequences are constructed.
The relative magnitudes of C, N, and the elements in the VALUE matrix determine how often and where blanks will be inserted into the original sequences in constructing the alignment.
Implementation on Architectures
There are two different implementations of the code availiable to users. The vector implementation is designed and optimized to run on vector supercomputers. The standard implementation is the vector code with Cray specific extensions removed from the code. The standard implementation compiles and runs correctly on most ASCII machines with ANSI FORTRAN-90 compilers, (including machines running UNIX and VMS. (Originally the code written in FORTRAN-77, but now it makes uses of FORTRAN-90 extensions for dynamic arrays. Compiling under FORTRAN-77 is possible, but static array sizes must be used.) A C compiler is also required. Please note that the standard version is not optimized for any scalar machine. The current code can be obtained at no cost for academic use. For more information on aquiring source code for your home site, contact the Pittsburgh Supercomputing Center's National Resource for Biomedical Supercomputing at biomed@psc.edu or http://www.nrbsc.org/.
General Program Information
The maxsegs program is keyword driven rather than menu-driven. The main reason for this is that the programs were designed to be used in batch modes. It is much easier to read and write an input file containing descriptive keywords than to make certain that the right input is on the correct line.
Program input consists of a keyword followed by zero or more keyword parameters. No individual line of input can exceed 132 characters in width. Input lines take the form:
[COMMAND][SEPARATOR][PARAM_1][SEPARATOR]...[SEPARATOR][PARAM_N]
Where [COMMAND] is the descriptive keyword
[PARAM_N] is a command parameter or sub-parameter
[SEPARATOR] is either the <SPACE>, <EQUALS> or
<COMMA> characters (All separators are
treated the same; it does not matter which
separator you use when one is required.)
In this manual, the space separator is used between the command and its parameters, the equals separator is used to indicate that the next parameter is a subparameter of the previous parameter, and the comma separator is used between independant parameters.
The programs signal that they are waiting for an input line by displaying a the program name as the prompt.
Program Commands
The following are brief descriptions of the commands available in the program:
- Alphabet allows the user to select one of several
pre-defined alphabets or to define a custom alphabet. An alphabet must be
defined for the program to work correctly.
- Echo toggles command line echoing (useful in
batch mode). Echoing is initially turned off.
- Help displays helpful information for the someone
who entered the program accidentally. This keyword does not access an
internal help facility.
- Limit allows the user to limit the amount of output
produced. (It is used to define how many alignments are to be presented
for inspection.)
- Match executes the alignment procedure once all
required parameters are set.
- Quit "END-OF-INPUT" keyword used to exit from the
program, and to terminate the custom alphabet defining mode.
- Score defines the scoring method used to obtain the
alignment(s).
- Sequence defines the input files where sequences
are to be read and aligned. Also used to define an output file containing
the alignments.
- Show prints the scoring matrix selected and the gap penalities.
- Title defines a descriptive label written on the
program's output.
- Translate Allows a user to specify DNA to protein
translation.
- Width allows the user to change the programs
alignment width (useful if you are using a 132 column printer or terminal.)
- Zz No operation. Allows the user to comment the input file.
Generally, the order that the commands are entered into the program are unimportant. However there are a few exceptions. The "ALPHABET" command must preceed the "SCORE" command. Both "ALPHABET" and "SCORE" commands must preceed the "SHOW" and "MATCH" commands.
The Help Command
The HELP command is intended to provide someone who has entered the program accidentally with enough information to allow them to exit gracefully. The help command DOES NOT access a built-in help facility.
USAGE: "HELP"
The Score Command: How to choose a scoring method
The SCORE command is used to select a scoring method for the two sequences. (This is used to select the VALUE table discussed earlier.) There are four diferent types of scoring methods that the user can choose from:
Vector method - Choose among X1, X2 and MATCH.
Table method - For proteins you may choose among PAM40, PAM80,
PAM120, PAM200, PAM250, PAM320, PROPERTIES,
PET250, MUTMTX, STRUCTURE, BLOSUM35, BLOSUM45,
BLOSUM62, BLOSUM80 and BLOSUM100. For nucleic
acids you may select from DNAPAM20, DNAPAM20TT,
DNAPAM30, DNAPAM47, DNAPAM50, DNAPAM50TT,
DNAPAM65, DNAPAM85, DNAPAM85TT and DNAPAM110.
With any of these matricies you have the option of
changing the default gap and newgap penalties.
User-defined vector - A scoring table where all matches
are given an equal score, and all
mismatches are given an equal penality.
User-defined matrix - The user inputs the entire scoring
matrix for the alphabet selected.
**** You MUST select a scoring method! ****
Users who are comparing sequences will probably want to use the table scoring method with a variety of scoring tables. If PAM matrices are chosen, then the general rule is that lower PAM matricies will bring out strong but short matches, while the higher PAM matrices, such as the PAM320 matrix, will bring out long but weak matches. A good analysis of the appropriate Dayhoff PAM matrix for each individual case can be found in the article "Amino Acid Substitution Matrices from an Information Theoretic Perspective" by Stephen Altschul ("Journal of Molecular Biology", 1991, vol. 219, pp. 555-565.) He suggests that for effective database searching, many matricies should be used. If three database searches are performed, Altschul suggests using the PAM 40, PAM 120, and PAM 250 For pairwise alignments, he suggests using either the PAM 80 in conjunction with the PAM 250 or the PAM 120 paired with the PAM 320 matrix. More recent articles have suggested that other matricies including the MUTMTX and the BLOSUM matricies might be better choices (See Henikoff and Henikoff 1993, PROTEINS: Structure Function and Genetics 17:49-61, Vogt, et. al. 1995, J. Mol. Biol, 249:816-831 and Pearson 1995, Protein Science 4:1145-1160.)
Vector Method
To choose a default scoring vector, simply enter the SCORE command followed by the default vector name. Currently the following scoring vectors are available:
- X1
- scoring table with match=1.0, mismatch=-0.9, gap=-2.0, newgap=0.0, and cutoff=5.9.
- X2
- scoring table with match=1.0, mismatch=0.0, gap=-1.0, newgap=-2.5, and cutoff=5.9.
- MATCH
- scoring table with match=1.0, mismatch=-1000.0, gap=-1000.0, newgap=-0.0, and cutoff=1.0.
The Table Method
To choose a default scoring table, simply enter the SCORE command followed by the default table name. You can also select your own GAP and NEWGAP (open gap) penalities by entering them after the table name (for example, PAM 250, gap=-5.0, newgap=-5.0.) Currently, the following scoring tables are available:
Dayhoff PAM series
PAM Matrix based on the work of Dayhoff et. al. 1978, Atlas of Protein Sequence and Struicture, 5 supl 3: 345-352.
- PAM40
- Dayhoff PAM-40 based similarity scoring matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-30.0
- PAM80
- Dayhoff PAM-80 based similarity scoring matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-22.0
- PAM120
- Dayhoff PAM-120 based similarity scoring matrix (shown in the appendix). According to Vogt et. al. 1995, appropriate values for pairwise alignments with this matrix are gap=-1.3 and newgap=-5.5 for local alignments and gap=-1.4 and newgap=-6.0 for global alignments. Default values are gap=-1.0 and newgap=-16.0
- PAM200
- Dayhoff PAM-200 based similarity scoring matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-12.0
- PAM250
- Dayhoff PAM-250 based similarity scoring matrix (shown in the appendix). Pearson 1995 illustrates that using ln/ln scaling this matrix performs best for database searches when using gap=-2.0 and newgap=-10.0. For unscaled scores best results are obtained using gap=-2.0 and newgap=-16.0. According to Vogt et. al. 1995, appropriate values for pairwise alignments with this matrix are gap=-1.3 and newgap=-6.0 for local alignments and gap=-0.5 and newgap=-11.0 for global alignments. Default values are gap=-1.0 and newgap=-16.0
- PAM320
- Dayhoff PAM-320 based similarity scoring matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-8.0.
BLOSUM series
BLOSUM Matrix based on the work of Henikoff and Henikoff 1992. Proc. Natl. Acad. Sci. USA 89: 10915 - 10919
- BLOSUM35
- Henikoff & Henikoff BLOSUM 35 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-10.0.
- BLOSUM45
- Henikoff & Henikoff BLOSUM 45 matrix (shown in the appendix). Pearson 1995 illustrates that using ln/ln scaling this matrix performs best for database searches when using gap=-1.0 and newgap=-12.0. For unscaled scores best results are obtained using gap=-2.0 and newgap=-12.0. Default values are gap=-1.0 and newgap=-10.0.
- BLOSUM62
- Henikoff & Henikoff BLOSUM 62 matrix (shown in the appendix). Pearson 1995 illustrates that using ln/ln scaling this matrix performs best for database searches when using gap=-1.0 and newgap=-6.0. For unscaled scores best results are obtained using gap=-4.0 and newgap=-6.0. According to Vogt et. al. 1995, appropriate values for pairwise alignments with this matrix are gap=-0.5 and newgap=-8.0 for local alignments and gap=-0.9 and newgap=-7.5 for global alignments. Default values are gap=-1.0 and newgap=-12.0.
- BLOSUM80
- Henikoff & Henikoff BLOSUM 80 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-16.0.
- BLOSUM100
- Henikoff & Henikoff BLOSUM 100 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-20.0.
DNA PAM series
DNA PAM Matrix based on the work of States et. al. 1991, Methods: A companion to methods in Enzymology 3:66-70.
- DNAPAM20
- DNA PAM 20 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-22.0
- DNAPAM20TT
- DNA PAM 20 matrix, incorporating transitions and transversions. (shown in the appendix). Default values are gap=-1.0 and newgap=-42.0
- DNAPAM30
- DNA PAM 30 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-18.0
- DNAPAM47
- DNA PAM 47 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-8.0
- DNAPAM50
- DNA PAM 50 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-14.0
- DNAPAM50TT
- DNA PAM 50 matrix, incorporating transitions and transversions. (shown in the appendix). Default values are gap=-1.0 and newgap=-26.0
- DNAPAM65
- DNA PAM 65 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-4.0
- DNAPAM85
- DNA PAM 85 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-8.0
- DNAPAM85TT
- DNA PAM 85 matrix, incorporating transitions and transversions. (shown in the appendix). Default values are gap=-1.0 and newgap=-2.0
- DNAPAM110
- DNA PAM 110 matrix (shown in the appendix). Default values are gap=-1.0 and newgap=-2.0
Other Matricies
- STRUCTURE
- Structure genetic similarity scoring matrix (shown in the appendix). Default values are gap=-2.0 and newgap=0.0.
- PROPERTIES
- Properties similarity scoring matrix (shown in the appendix). Default value are gap=-3.0 and newgap=0.0
- MUTMTX
- The Gonnet-Cohen-Benner mutation matrix (shown in the appendix). Pearson 1995, shows that this matrix performs well for database searches using ln/ln scaling with gap=-10.0 and newgap=-140.0. For unscaled scores gap=-20.0 and newgap=-120.0 perform better. According to Vogt et. al. 1995, appropriate values for pairwise alignments with this matrix are gap=-4.0 and newgap=-75.0 for local alignments and gap=-2.0 and newgap=-140.0 for global alignments. The default value are gap=-16.0, and newgap=-206.0.
- PET250
- The 1991 Pairwise Exchange Table (PET) matrix at 250 PAM. (shown in the appendix). According to Vogt et. al. 1995, appropriate values for pairwise alignments with this matrix are gap=-0.4 and newgap=-10.5 for local alignments and gap=-1.5 and newgap=-9.0 for global alignments. The default value are gap= -4.0 and newgap=0.0
The User Defined Vector
To select a user-defined vector scoring scheme, simply enter the vector parameters (and parameter values) after the SCORE command. The vector scoring parameters are:
- MATCH
- assign this score when two letters match.
- MISMATCH
- assign this score when two letters do not match.
- GAP
- assign this score when a gap is extended in length.
- NEWGAP
- assign this score (in addition to the gap score) when a new gap is opened.
Note: The values associated with these parameters should be entered as REAL (floating point) numbers.
The User Defined Matrix
To use a user defined scoring matrix simply enter the word "MATRIX" after the SCORE keyword. The user should enter the GAP and NEWGAP penality followed only the lower triangular portion of the desired scoring matrix. The upper triangular portion of the matrix will be automatically generated by symmetry. You may insert as many spaces as needed between numbers to line up the columns of the scoring matrix. An example of a scoring matrix for the alphabet {A},{C},{G},{T} is given below.
MAXSEGS> SCORE MATRIX
***** Using "MATRIX" scoring scheme.
Please enter the GAP penality: -4.0
Please enter the NEWGAP penality: -40.0
Please enter the matrix for the appropriate pair.
Only enter the lower triangular part of the matrix.
The Upper portion of the matrix is filled in
automatically.
A C G T N
A 10.0
C -1.0 20.0
G -2.0 -5.0 30.0
T -3.0 -6.0 -8.0 40.0
N -4.0 -7.0 -9.0 -10.0 50.0
USAGE: "SCORE <vector_name>"
"SCORE <table_name>"
"SCORE <table_name>,GAP=<REAL>,NEWGAP=<REAL>"
"SCORE GAP=<REAL>,NEWGAP=<REAL>,MATCH=<REAL>,MISMATCH=<REAL>"
"SCORE MATRIX"
The Echo Command
The ECHO command togels command line echoing. This command is most useful when the program is running in batch mode.
Usage: "ECHO"
The Translate Command: Perform DNA to Protein Translation
The translate command can be used to convert a DNA library into a protein library in any of the six possible reading frames. Reading frames are specified by a six digit integer number. The first three digits specify the forward frames, while the last three specify the reverse, complemented frames:
XXXXXX
||||||__Third reverse complimented frame
|||||
|||||___Second reverse complimented frame
||||
||||____First reverse complimented frame
|||
|||_____Third forward frame
||
||______Second forward frame
|
|_______First forward frame
The number 1 is used to indicate that the frame should be translated, while a 0 is used to indicate that a frame should not be translated. For example:
111000 - translate the forward frames one, two, and three.
100000 - translate only forward frame one.
000100 - translate only reverse, complimented frame one.
111111 - translate all frames.
Usage: "TRANSLATE 111000"
The Title Command: Labeling your Output
The TITLE command allows the user to write a title to the output stream. This command is used primarily for the users identification purposes. To label your output simply enter the TITLE command followed by the label. The label should be less than 125 characters. Spaces are kept, and the title does not have to be in quotes.
USAGE: "TITLE
The Alphabet Command: How to select the proper Alphabet
The ALPHABET command is used to set the sequence alphabet to either one of several default alphabets or a user defined "set" alphabet. You must choose an alphabet before you issue the "MATCH" command. Default alphabets that are availiable are:
- PROTEIN
- Alphabet suitable for protein sequences (23 characters)
- NUCLEIC
- Alphabet suitable for nucleic acid sequences (5 characters)
- AMBIGUOUS
- Alphabet suitable for nucleic acid sequences (15 characters)
Default Alphabets
The protein alphabet is defined as: (where the notation "{ }" is used as set notation)
{A,a}, {B,b}, {C,c}, {D,d}, {E,e}, {F,f}, {G,g},
{H,h}, {I,i}, {L,l}, {M,m}, {N,n}, {P,p}, {Q,q},
{R,r}, {S,s}, {T,t}, {V,v}, {W,w}, {X,x}, {Y,y},
{Z,z}
This notation means that every letter between the "{}" has the same meaning. For example, {A,a} means that the lowercase "a" and the uppercase "A" are treated as equivalent letters.)
The nucleic alphabet is defined as:
{A,a}, {C,c}, {G,g}, {N,X,n,x}, {T,U,t,u}
The ambiguous alphabet is defined as:
{A,a}, {B,b}, {C,c}, {D,d}, {G,g}, {H,h}, {K,k},
{M,m}, {N,X,n,x}, {R,r}, {S,s}, {T,U,t,u}, {V,v},
{W,w}, {Y,y}
User Defined Alphabets
In addition to being able to select among several default alphabets, the user can also define a custom alphabet. The user can define several letters which appear in the sequence data as equivalent and represented in the output as a single letter. If the user enters the ALPHABET command without a parameter, he or she must specify the alphabet followed by the keyword QUIT. For example, if one would want to define a purine/pyrimidine alphabet R,Y over the letters A,G,C,T,U one would enter the following:
MAXSEGS> ALPHABET
Please enter the alphabet (form A=B,C,D).
Use only one "=" per line, enter QUIT as the only
word on the line to end the alphabet-entering mode.
R=A,G
Y=C,T,U
QUIT
MAXSEGS>
The letters A, and G are now all equivalent to "R". T, U, and C are now equivalent to "Y".
The following is an example of how to define a neutral-polar (P), neutral-nonpolar (N), acidic (A), and basic (B) alphabet.
MAXSEGS> ALPHABET
Please enter the alphabet (form A=B,C,D).
Use only one "=" per line, enter QUIT as the only
word on the line to end the alphabet-entering mode.
P=S,T,Y,W,N,Q,C
N=G,A,V,I,L,F,P,M
A=D,E
B=K,R,H
QUIT
MAXSEGS>
Usage: "ALPHABET <pre-defined-name>
"ALPHABET"
The Width Command
The WIDTH command is used to change the output width. By default, the sequence alignments output width is set to 80 characters. The WIDTH command can be used to set the output width between 40 and 132 characters. To change the output width, enter the WIDTH command followed by the output width.
Usage:"WIDTH=<INTEGER>"
The Limit Command: Limiting the volume of alignments
The LIMIT command is used to define the number of sequence alignments to be retreived and displayed. The LIMIT command takes up to two parameters:
- CUTOFF=X
- Causes the program to display only alignments that have a similarity score of "X" or greater.
- NUMBER=Y
- Causes the program not to display any more than "Y" best alignments for any pair of sequences, regardless of how high the additional best sub-alignments scored. This parameter does not limit the total number of alignments in the output to "Y" alignments.
Note: If you use any limit parameter, alignments will be printed until any of the above limits is reached.
Usage: "LIMIT CUTOFF=<INTEGER>,NUMBER=<INTEGER>,QUALITY=<REAL>"
The Match Command: Finding the Optimal Alignments
The MATCH command is used only after an alphabet and a scoring routine have been defined. The MATCH command causes the program to compute the similarity scores for the given sequences and to display the alignments (up to the limits imposed by the LIMIT command, or by the limits associated with the scoring method selected.) The MATCH command has two distinctive modes: One allowing for the selection of individual sequences from the sequence files (PICK mode) and one allowing the user to compare all the sequences (ALL mode).
Either mode can accept the following optional parameters:
- LOCAL
- Align sequences using the local alignment algorithm of Waterman and Eggert 1987, J. Mol. Biol. 197: 723-728.
- GLOBAL
- Align sequences using a quasi-global sequence alignment algorithm similar to Sellers, Proc. NAtl. Acad. Sci. USA.
- SCALE
- Rescale alignment scores using ln/ln scaling proposed by Pearson 1995, Protein Science 4:1145-1160.
The "ALL" mode can take an optional parameter:
- ALL
- Parameter used if the user wishes to align all of the query sequences with all of the library sequences. This is the default mode.
- PAIRWISE
- Parameter used if the library file and the query file have exactly the same sequences in them. This reduces the amount of comparisons that need to be done by a factor of 2.
The "PICK" mode must utilize two parameters:
- LIBRARY=<INDEX>
- Select the sequence in the library file refered to by <INDEX>. The index is either the Locus name (for a GenBank sequence), Sequence identifier (for an EMBL or SWISS-PROT sequence) or the sequence name (for NBRF sequences). This parameter is required to use the pick mode.
- QUERY=<INDEX>
- Select the sequence in the query file refered to by <INDEX>. The index is either the Locus name (for a GenBank sequence), Sequence identifier (for an EMBL or SWISS-PROT sequence) or the sequence name (for NBRF sequences). This parameter is required to use the pick mode.
Usage: "MATCH ALL"
"MATCH LOCAL SCALE ALL"
"MATCH PAIRWISE"
"MATCH LIBRARY=<INDEX>,QUERY=<INDEX>"
The Sequence Command: How to select Sequences
The SEQUENCE command is used to indicate what files the query and library sequences are to be taken from. It is also used to indicate what file name the results should be written to. The following parameters have been incorporated with the SEQUENCE command:
- LIBRARY
- keyword which indicates that the library sequence(s) are to be taken from the named file.
- QUERY
- keyword which indicates that the query sequence(s) are to be taken from the named file.
- RESULTS
- keyword which indicates that the program results (the alignment(s)) are to be written to this named file.
The program is designed to search sequence files in the NBRF-PIR, GenBank, EMBL, Swiss-Prot, FASTA(*), and GCG individual sequence formats(*).
(*) Support for the FASTA and GCG formats are new in this release. We do not normally keep the databases in FASTA format, so this routine has not been thoroughly tested. We would appreciate receiving reports and bug-fixes that you may have with these input routines. Please send the reports/fixes to biomed@psc.edu
Usage: "SEQUENCE LIBRARY=<file>, QUERY=<file>, RESULT=<file>"
The Quit Command
The QUIT command is used to terminate the programs and to return the user to the operating system level. The QUIT command is equivalent to the "END-OF-FILE" marker, and entering the "END-OF-FILE" character will terminate the program as well.
Usage: "QUIT"
Sample Program Input
The following is sample program input to the Maxsegs program. The first line places a title on the output. The second line tells the program that the alignments should not have more than 80 characters per line. The third line selects the protein alphabet. The fourth line selects the scoring method. In this case, the Dayhoff PAM120 matrix has been selected. The Gap extension penality is -8 while the Gap open penality is 0. The fifth line contains information to limit the amount of output. The program will produce no more than the 2 best sub-alignments between any pair of sequences. All alignments will need to score 50 or higher to be produced. The sixth line tells the program that our library of sequences is in the file "LIBRARY" while our query sequence is in the file "QUERY". We want the alignments written to the file "RESULT". The next line indicates that we want all of the sequences in the query file matched with all of the sequences in the library file. The last line ends the program.
TITLE Rattle snake sequence (PDB code=1PP2R) vs swiss-prot WIDTH = 80 ALPHABET PROTEIN SCORE PAM120, GAP= -8, NEWGAP= 0 LIMIT CUTOFF= 50 NUMBER= 2 SEQUENCE LIBRARY=LIBRARY QUERY=QUERY RESULT=RESULT MATCH ALL QUIT
References
- Waterman and Eggert 1987, J. Mol. Biol. 197: 723-728.
- Altschul 1991, J. Mol. Biol. 219: 555-565.
- Henikoff and Henikoff 1993, PROTEINS: Structure Function and Genetics 17:49-61
- Vogt, Etzold and Argos 1995, J. Mol. Biol, 249: 816-831.
- Pearson 1995, Protein Science 4: 1145-1160.
- Dayhoff, Schwartz, and Orcutt 1978, Atlas of Protein Sequence and Structure, 5 supl 3: 345-352
- States, Gish and Altschul 1991, Methods: A companion to methods in Enzymology 3: 66-70.
- Henikoff and Henikoff 1992. Proc. Natl. Acad. Sci. USA 89: 10915 - 10919
- Gonnet, Cohen, and Benner 1992, Science 256: 1443-1445
- Jones, Taylor, and Thorton 1992, Comput. Appl. Biosci. 8: 275-282.
Appendices
- Dayhoff PAM 40 Matrix
- Dayhoff PAM 80 Matrix
- Dayhoff PAM 120 Matrix
- Dayhoff PAM 200 Matrix
- Dayhoff PAM 250 Matrix
- Dayhoff PAM 320 Matrix
- BLOSUM 35 Matrix
- BLOSUM 45 Matrix
- BLOSUM 62 Matrix
- BLOSUM 80 Matrix
- BLOSUM 100 Matrix
- DNA PAM 20 Matrix
- DNA PAM 20 Transition/Transversion Matrix
- DNA PAM 30 Matrix
- DNA PAM 47 Matrix
- DNA PAM 50 Matrix
- DNA PAM 50 Transition/Transversion Matrix
- DNA PAM 65 Matrix
- DNA PAM 85 Matrix
- DNA PAM 85 Transition/Transversion Matrix
- DNA PAM 110 Matrix
- Structure-Genetic Scoring Matrix
- Properties Scoring Matrix
- 1991 Pairwise Exchange Table (PET) at 250 PAM
- Gonnet Mutation Matrix
- Nucleotides
- Amino Acids
Dayhoff PAM 40 Matrix
A R N D C Q E G H I L K
A 6 -6 -3 -3 -6 -4 -2 -1 -6 -4 -5 -6
R -6 8 -5 -9 -7 -1 -8 -9 -1 -5 -7 1
N -3 -5 7 2 -9 -3 -1 -2 1 -4 -7 0
D -3 -9 2 7 -12 -2 3 -2 -3 -6 -11 -4
C -6 -7 -9 -12 10 -12 -12 -10 -6 -6 -13 -12
Q -4 -1 -3 -2 -12 8 2 -6 1 -6 -4 -2
E -2 -8 -1 3 -12 2 7 -3 -4 -5 -8 -4
G -1 -9 -2 -2 -10 -6 -3 6 -9 -9 -8 -6
H -6 -1 1 -3 -6 1 -4 -9 9 -9 -6 -5
I -4 -5 -4 -6 -6 -6 -5 -9 -9 8 -1 -5
L -5 -7 -7 -11 -13 -4 -8 -8 -6 -1 7 -7
K -6 1 0 -4 -12 -2 -4 -6 -5 -5 -7 6
M -4 -4 -7 -10 -12 -3 -8 -9 -9 0 1 -1
F -7 -7 -7 -12 -11 -11 -12 -8 -5 -2 -2 -12
P -1 -3 -5 -6 -7 -2 -4 -5 -3 -8 -6 -5
S 0 -2 0 -3 -2 -4 -3 -1 -5 -6 -8 -3
T 0 -5 -1 -4 -6 -5 -5 -5 -6 -2 -6 -2
W -12 -1 -9 -14 -14 -12 -15 -13 -9 -12 -10 -10
Y -7 -11 -4 -10 -3 -10 -7 -12 -3 -6 -6 -10
V -2 -6 -6 -7 -5 -6 -6 -5 -6 2 -2 -7
B -2 -5 7 7 -10 -1 3 -1 0 -4 -7 -1
Z -1 -2 -1 3 -11 7 7 -3 1 -4 -4 -2
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
M F P S T W Y V B Z X
A -4 -7 -1 0 0 -12 -7 -2 -2 -1 -1
R -4 -7 -3 -2 -5 -1 -11 -6 -5 -2 -1
N -7 -7 -5 0 -1 -9 -4 -6 7 -1 -1
D -10 -12 -6 -3 -4 -14 -10 -7 7 3 -1
C -12 -11 -7 -2 -6 -14 -3 -5 -10 -11 -1
Q -3 -11 -2 -4 -5 -12 -10 -6 -1 7 -1
E -8 -12 -4 -3 -5 -15 -7 -6 3 7 -1
G -9 -8 -5 -1 -5 -13 -12 -5 -1 -3 -1
H -9 -5 -3 -5 -6 -9 -3 -6 0 1 -1
I 0 -2 -8 -6 -2 -12 -6 2 -4 -4 -1
L 1 -2 -6 -8 -6 -10 -6 -2 -7 -4 -1
K -1 -12 -5 -3 -2 -10 -10 -7 -1 -2 -1
M 11 -4 -8 -5 -3 -12 -10 -1 -7 -4 -1
F -4 9 -10 -6 -7 -3 2 -7 -7 -11 -1
P -8 -10 8 -1 -3 -12 -12 -5 -4 -2 -1
S -5 -6 -1 6 1 -4 -6 -5 0 -3 -1
T -3 -7 -3 1 7 -11 -6 -2 -1 -4 -1
W -12 -3 -12 -4 -11 13 -3 -15 -10 -12 -1
Y -10 2 -12 -6 -6 -3 9 -6 -4 -7 -1
V -1 -7 -5 -5 -2 -15 -6 7 -6 -5 -1
B -7 -7 -4 0 -1 -10 -4 -6 8 2 -1
Z -4 -11 -2 -3 -4 -12 -7 -5 2 8 -1
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Dayhoff PAM 80 Matrix
A R N D C Q E G H I L K
A 4 -4 -1 -1 -4 -2 -1 0 -4 -2 -4 -4
R -4 7 -2 -5 -5 0 -4 -6 0 -3 -5 2
N -1 -2 6 3 -6 -1 0 -1 2 -3 -5 0
D -1 -5 3 6 -9 0 4 -1 -1 -4 -7 -2
C -4 -5 -6 -9 9 -9 -9 -7 -5 -4 -9 -9
Q -2 0 -1 0 -9 7 2 -4 2 -4 -3 -1
E -1 -4 0 4 -9 2 6 -2 -2 -3 -5 -2
G 0 -6 -1 -1 -7 -4 -2 6 -6 -6 -6 -4
H -4 0 2 -1 -5 2 -2 -6 8 -6 -4 -3
I -2 -3 -3 -4 -4 -4 -3 -6 -6 7 1 -3
L -4 -5 -5 -7 -9 -3 -5 -6 -4 1 6 -5
K -4 2 0 -2 -9 -1 -2 -4 -3 -3 -5 6
M -3 -2 -4 -6 -8 -2 -5 -6 -6 1 2 0
F -5 -5 -5 -8 -8 -8 -8 -6 -3 0 0 -8
P 0 -2 -3 -4 -4 -1 -2 -3 -2 -5 -4 -3
S 1 -1 1 -1 -1 -3 -2 0 -3 -4 -5 -2
T 1 -3 0 -2 -4 -3 -3 -2 -4 -1 -4 -1
W -8 0 -6 -10 -10 -8 -11 -10 -6 -9 -7 -7
Y -5 -8 -2 -7 -2 -7 -5 -8 -1 -4 -4 -7
V 0 -4 -4 -5 -3 -4 -4 -3 -4 3 0 -5
B 0 -3 5 6 -6 1 3 0 1 -3 -5 1
Z 0 0 1 3 -8 6 6 -2 2 -3 -3 0
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
M F P S T W Y V B Z X
A -3 -5 0 1 1 -8 -5 0 0 0 -1
R -2 -5 -2 -1 -3 0 -8 -4 -3 0 -1
N -4 -5 -3 1 0 -6 -2 -4 5 1 -1
D -6 -8 -4 -1 -2 -10 -7 -5 6 3 -1
C -8 -8 -4 -1 -4 -10 -2 -3 -6 -8 -1
Q -2 -8 -1 -3 -3 -8 -7 -4 1 6 -1
E -5 -8 -2 -2 -3 -11 -5 -4 3 6 -1
G -6 -6 -3 0 -2 -10 -8 -3 0 -2 -1
H -6 -3 -2 -3 -4 -6 -1 -4 1 2 -1
I 1 0 -5 -4 -1 -9 -4 3 -3 -3 -1
L 2 0 -4 -5 -4 -7 -4 0 -5 -3 -1
K 0 -8 -3 -2 -1 -7 -7 -5 1 0 -1
M 9 -2 -5 -3 -2 -8 -6 1 -4 -2 -1
F -2 8 -7 -4 -5 -2 4 -4 -5 -7 -1
P -5 -7 7 0 -1 -8 -8 -3 -2 -1 -1
S -3 -4 0 4 2 -3 -4 -3 1 -1 -1
T -2 -5 -1 2 5 -8 -4 -1 0 -2 -1
W -8 -2 -8 -3 -8 13 -2 -11 -7 -8 -1
Y -6 4 -8 -4 -4 -2 9 -4 -3 -5 -1
V 1 -4 -3 -3 -1 -11 -4 6 -3 -3 -1
B -4 -5 -2 1 0 -7 -3 -3 7 3 -1
Z -2 -7 -1 -1 -2 -8 -5 -3 3 7 -1
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Dayhoff PAM 120 Matrix
A R N D C Q E G H I L K
A 3 -3 0 0 -3 -1 0 1 -3 -1 -3 -2
R -3 6 -1 -3 -4 1 -3 -4 1 -2 -4 2
N 0 -1 4 2 -5 0 1 0 2 -2 -4 1
D 0 -3 2 5 -7 1 3 0 0 -3 -5 -1
C -3 -4 -5 -7 9 -7 -7 -5 -4 -3 -7 -7
Q -1 1 0 1 -7 6 2 -3 3 -3 -2 0
E 0 -3 1 3 -7 2 5 -1 -1 -3 -4 -1
G 1 -4 0 0 -5 -3 -1 5 -4 -4 -5 -3
H -3 1 2 0 -4 3 -1 -4 7 -4 -3 -2
I -1 -2 -2 -3 -3 -3 -3 -4 -4 6 1 -2
L -3 -4 -4 -5 -7 -2 -4 -5 -3 1 5 -4
K -2 2 1 -1 -7 0 -1 -3 -2 -2 -4 5
M -2 -1 -3 -4 -6 -1 -4 -4 -4 1 3 0
F -4 -4 -4 -7 -6 -6 -6 -5 -2 0 0 -6
P 1 -1 -2 -2 -3 0 -1 -2 -1 -3 -3 -2
S 1 -1 1 0 -1 -2 -1 1 -2 -2 -4 -1
T 1 -2 0 -1 -3 -2 -2 -1 -3 0 -3 -1
W -7 1 -5 -8 -8 -6 -8 -8 -5 -7 -5 -5
Y -4 -6 -2 -5 -1 -5 -4 -6 -1 -2 -3 -6
V 0 -3 -3 -3 -2 -3 -3 -2 -3 3 1 -4
B 0 -2 3 3 -6 0 2 0 1 -2 -4 0
Z 0 -1 0 2 -7 4 3 -2 1 -3 -3 0
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
M F P S T W Y V B Z X
A -2 -4 1 1 1 -7 -4 0 0 0 -1
R -1 -4 -1 -1 -2 1 -6 -3 -2 -1 -1
N -3 -4 -2 1 0 -5 -2 -3 3 0 -1
D -4 -7 -2 0 -1 -8 -5 -3 3 2 -1
C -6 -6 -3 -1 -3 -8 -1 -2 -6 -7 -1
Q -1 -6 0 -2 -2 -6 -5 -3 0 4 -1
E -4 -6 -1 -1 -2 -8 -4 -3 2 3 -1
G -4 -5 -2 1 -1 -8 -6 -2 0 -2 -1
H -4 -2 -1 -2 -3 -5 -1 -3 1 1 -1
I 1 0 -3 -2 0 -7 -2 3 -2 -3 -1
L 3 0 -3 -4 -3 -5 -3 1 -4 -3 -1
K 0 -6 -2 -1 -1 -5 -6 -4 0 0 -1
M 8 -1 -3 -2 -1 -7 -4 1 -3 -2 -1
F -1 8 -5 -3 -4 -1 4 -3 -5 -6 -1
P -3 -5 6 1 -1 -7 -6 -2 -2 0 -1
S -2 -3 1 3 2 -2 -3 -2 0 -1 -1
T -1 -4 -1 2 4 -6 -3 0 0 -2 -1
W -7 -1 -7 -2 -6 12 -1 -8 -6 -7 -1
Y -4 4 -6 -3 -3 -1 8 -3 -3 -4 -1
V 1 -3 -2 -2 0 -8 -3 5 -3 -3 -1
B -3 -5 -2 0 0 -6 -3 -3 3 1 -1
Z -2 -6 0 -1 -2 -7 -4 -3 1 3 -1
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Dayhoff PAM 200 Matrix
A R N D C Q E G H I L K
A 2 -2 0 0 -2 -1 0 1 -1 -1 -2 -1
R -2 5 0 -1 -3 1 -1 -3 1 -2 -2 2
N 0 0 2 2 -3 0 1 0 1 -1 -2 1
D 0 -1 2 3 -4 1 3 0 0 -2 -3 0
C -2 -3 -3 -4 8 -4 -4 -3 -3 -2 -5 -4
Q -1 1 0 1 -4 4 2 -1 2 -2 -1 0
E 0 -1 1 3 -4 2 3 0 0 -2 -3 0
G 1 -3 0 0 -3 -1 0 4 -2 -2 -3 -2
H -1 1 1 0 -3 2 0 -2 5 -2 -2 0
I -1 -2 -1 -2 -2 -2 -2 -2 -2 4 2 -1
L -2 -2 -2 -3 -5 -1 -3 -3 -2 2 4 -2
K -1 2 1 0 -4 0 0 -2 0 -1 -2 4
M -1 0 -2 -2 -4 -1 -2 -3 -2 2 3 1
F -3 -3 -2 -4 -4 -4 -4 -3 -1 1 1 -4
P 1 0 0 -1 -2 0 0 -1 0 -2 -2 -1
S 1 0 1 0 0 -1 0 1 -1 -1 -2 0
T 1 -1 0 0 -2 -1 0 0 -1 0 -1 0
W -5 1 -4 -6 -6 -4 -6 -5 -3 -5 -4 -3
Y -3 -4 -1 -3 0 -3 -3 -4 0 -1 -1 -4
V 0 -2 -2 -2 -2 -2 -2 -1 -2 3 1 -2
B 1 0 3 4 -3 2 3 1 2 -1 -2 1
Z 1 1 2 3 -3 4 4 0 2 -1 -1 1
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
M F P S T W Y V B Z X
A -1 -3 1 1 1 -5 -3 0 1 1 -1
R 0 -3 0 0 -1 1 -4 -2 0 1 -1
N -2 -2 0 1 0 -4 -1 -2 3 2 -1
D -2 -4 -1 0 0 -6 -3 -2 4 3 -1
C -4 -4 -2 0 -2 -6 0 -2 -3 -3 -1
Q -1 -4 0 -1 -1 -4 -3 -2 2 4 -1
E -2 -4 0 0 0 -6 -3 -2 3 4 -1
G -3 -3 -1 1 0 -5 -4 -1 1 0 -1
H -2 -1 0 -1 -1 -3 0 -2 2 2 -1
I 2 1 -2 -1 0 -5 -1 3 -1 -1 -1
L 3 1 -2 -2 -1 -4 -1 1 -2 -1 -1
K 1 -4 -1 0 0 -3 -4 -2 1 1 -1
M 5 0 -2 -1 0 -4 -2 1 -1 0 -1
F 0 7 -4 -2 -2 0 5 -1 -2 -3 -1
P -2 -4 5 1 0 -5 -4 -1 0 1 -1
S -1 -2 1 2 1 -2 -2 -1 1 1 -1
T 0 -2 0 1 3 -4 -2 0 1 0 -1
W -4 0 -5 -2 -4 12 0 -6 -3 -4 -1
Y -2 5 -4 -2 -2 0 7 -2 -1 -2 -1
V 1 -1 -1 -1 0 -6 -2 4 -1 -1 -1
B -1 -2 0 1 1 -3 -1 -1 4 4 -1
Z 0 -3 1 1 0 -4 -2 -1 4 5 -1
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Dayhoff PAM 250 Matrix
A R N D C Q E G H I L K
A 2 -2 0 0 -2 0 0 1 -1 -1 -2 -1
R -2 6 0 -1 -4 1 -1 -3 2 -2 -3 3
N 0 0 2 2 -4 1 1 0 2 -2 -3 1
D 0 -1 2 4 -5 2 3 1 1 -2 -4 0
C -2 -4 -4 -5 12 -5 -5 -3 -3 -2 -6 -5
Q 0 1 1 2 -5 4 2 -1 3 -2 -2 1
E 0 -1 1 3 -5 2 4 0 1 -2 -3 0
G 1 -3 0 1 -3 -1 0 5 -2 -3 -4 -2
H -1 2 2 1 -3 3 1 -2 6 -2 -2 0
I -1 -2 -2 -2 -2 -2 -2 -3 -2 5 2 -2
L -2 -3 -3 -4 -6 -2 -3 -4 -2 2 6 -3
K -1 3 1 0 -5 1 0 -2 0 -2 -3 5
M -1 0 -2 -3 -5 -1 -2 -3 -2 2 4 0
F -4 -4 -4 -6 -4 -5 -5 -5 -2 1 2 -5
P 1 0 -1 -1 -3 0 -1 -1 0 -2 -3 -1
S 1 0 1 0 0 -1 0 1 -1 -1 -3 0
T 1 -1 0 0 -2 -1 0 0 -1 0 -2 0
W -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3
Y -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4
V 0 -2 -2 -2 -2 -2 -2 -1 -2 4 2 -2
B 0 -1 2 3 -4 1 2 0 1 -2 -3 1
Z 0 0 1 3 -5 3 3 -1 2 -2 -3 0
X 0 -1 0 -1 -3 -1 -1 -1 -1 -1 -1 -1
M F P S T W Y V B Z X
A -1 -4 1 1 1 -6 -3 0 0 0 0
R 0 -4 0 0 -1 2 -4 -2 -1 0 -1
N -2 -4 -1 1 0 -4 -2 -2 2 1 0
D -3 -6 -1 0 0 -7 -4 -2 3 3 -1
C -5 -4 -3 0 -2 -8 0 -2 -4 -5 -3
Q -1 -5 0 -1 -1 -5 -4 -2 1 3 -1
E -2 -5 -1 0 0 -7 -4 -2 2 3 -1
G -3 -5 -1 1 0 -7 -5 -1 0 -1 -1
H -2 -2 0 -1 -1 -3 0 -2 1 2 -1
I 2 1 -2 -1 0 -5 -1 4 -2 -2 -1
L 4 2 -3 -3 -2 -2 -1 2 -3 -3 -1
K 0 -5 -1 0 0 -3 -4 -2 1 0 -1
M 6 0 -2 -2 -1 -4 -2 2 -2 -2 -1
F 0 9 -5 -3 -3 0 7 -1 -5 -5 -2
P -2 -5 6 1 0 -6 -5 -1 -1 0 -1
S -2 -3 1 2 1 -2 -3 -1 0 0 0
T -1 -3 0 1 3 -5 -3 0 0 -1 0
W -4 0 -6 -2 -5 17 0 -6 -5 -6 -4
Y -2 7 -5 -3 -3 0 10 -2 -3 -4 -2
V 2 -1 -1 -1 0 -6 -2 4 -2 -2 -1
B -2 -5 -1 0 0 -5 -3 -2 2 2 0
Z -2 -5 0 0 -1 -6 -4 -2 2 3 -1
X -1 -2 -1 0 0 -4 -2 -1 0 -1 -1
Dayhoff PAM 320 Matrix
A R N D C Q E G H I L K
A 1 -1 0 0 -1 0 0 1 -1 0 -1 0
R -1 3 0 0 -2 1 0 -1 1 -1 -1 2
N 0 0 1 1 -2 1 1 0 1 -1 -1 1
D 0 0 1 2 -3 1 2 1 0 -1 -2 0
C -1 -2 -2 -3 7 -3 -3 -2 -2 -1 -3 -3
Q 0 1 1 1 -3 2 1 0 1 -1 -1 1
E 0 0 1 2 -3 1 2 0 0 -1 -2 0
G 1 -1 0 1 -2 0 0 2 -1 -1 -2 -1
H -1 1 1 0 -2 1 0 -1 3 -1 -1 0
I 0 -1 -1 -1 -1 -1 -1 -1 -1 2 1 -1
L -1 -1 -1 -2 -3 -1 -2 -2 -1 1 3 -1
K 0 2 1 0 -3 1 0 -1 0 -1 -1 2
M -1 0 -1 -1 -3 0 -1 -1 -1 1 2 0
F -2 -2 -1 -3 -2 -2 -2 -2 -1 1 1 -2
P 1 0 0 0 -1 0 0 0 0 -1 -1 0
S 1 0 0 0 0 0 0 1 0 -1 -1 0
T 1 0 0 0 -1 0 0 0 -1 0 -1 0
W -3 1 -2 -4 -4 -3 -4 -4 -2 -3 -2 -2
Y -2 -2 -1 -2 0 -2 -2 -3 0 0 0 -3
V 0 -1 -1 -1 -1 -1 -1 -1 -1 2 1 -1
B 1 1 2 2 -1 2 2 1 2 0 -1 1
Z 1 1 2 2 -2 2 3 1 2 0 0 1
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
M F P S T W Y V B Z X
A -1 -2 1 1 1 -3 -2 0 1 1 -1
R 0 -2 0 0 0 1 -2 -1 1 1 -1
N -1 -1 0 0 0 -2 -1 -1 2 2 -1
D -1 -3 0 0 0 -4 -2 -1 2 2 -1
C -3 -2 -1 0 -1 -4 0 -1 -1 -2 -1
Q 0 -2 0 0 0 -3 -2 -1 2 2 -1
E -1 -2 0 0 0 -4 -2 -1 2 3 -1
G -1 -2 0 1 0 -4 -3 -1 1 1 -1
H -1 -1 0 0 -1 -2 0 -1 2 2 -1
I 1 1 -1 -1 0 -3 0 2 0 0 -1
L 2 1 -1 -1 -1 -2 0 1 -1 0 -1
K 0 -2 0 0 0 -2 -3 -1 1 1 -1
M 3 0 -1 -1 0 -3 -1 1 0 0 -1
F 0 5 -2 -2 -1 1 4 0 -1 -1 -1
P -1 -2 3 1 0 -3 -2 -1 1 1 -1
S -1 -2 1 1 1 -1 -2 0 1 1 -1
T 0 -1 0 1 1 -3 -1 0 1 1 -1
W -3 1 -3 -1 -3 11 1 -4 -2 -2 -1
Y -1 4 -2 -2 -1 1 6 -1 -1 -1 -1
V 1 0 -1 0 0 -4 -1 2 0 0 -1
B 0 -1 1 1 1 -2 -1 0 3 3 -1
Z 0 -1 1 1 1 -2 -1 0 3 4 -1
X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
BLOSUM 35 Matrix
A R N D C Q E G H I L K
A 5 -1 -1 -1 -2 0 -1 0 -2 -1 -2 0
R -1 8 -1 -1 -3 2 -1 -2 -1 -3 -2 2
N -1 -1 7 1 -1 1 -1 1 1 -1 -2 0
D -1 -1 1 8 -3 -1 2 -2 0 -3 -2 -1
C -2 -3 -1 -3 15 -3 -1 -3 -4 -4 -2 -2
Q 0 2 1 -1 -3 7 2 -2 -1 -2 -2 0
E -1 -1 -1 2 -1 2 6 -2 -1 -3 -1 1
G 0 -2 1 -2 -3 -2 -2 7 -2 -3 -3 -1
H -2 -1 1 0 -4 -1 -1 -2 12 -3 -2 -2
I -1 -3 -1 -3 -4 -2 -3 -3 -3 5 2 -2
L -2 -2 -2 -2 -2 -2 -1 -3 -2 2 5 -2
K 0 2 0 -1 -2 0 1 -1 -2 -2 -2 5
M 0 0 -1 -3 -4 -1 -2 -1 1 1 3 0
F -2 -1 -1 -3 -4 -4 -3 -3 -3 1 2 -1
P -2 -2 -2 -1 -4 0 0 -2 -1 -1 -3 0
S 1 -1 0 -1 -3 0 0 1 -1 -2 -2 0
T 0 -2 0 -1 -1 0 -1 -2 -2 -1 0 0
W -2 0 -2 -3 -5 -1 -1 -1 -4 -1 0 0
Y -1 0 -2 -2 -5 0 -1 -2 0 0 0 -1
V 0 -1 -2 -2 -2 -3 -2 -3 -4 4 2 -2
B -1 -1 4 5 -2 0 0 0 0 -2 -2 0
Z -1 0 0 1 -2 4 5 -2 -1 -3 -2 1
X 0 -1 0 -1 -2 -1 -1 -1 -1 0 0 0
M F P S T W Y V B Z X
A 0 -2 -2 1 0 -2 -1 0 -1 -1 0
R 0 -1 -2 -1 -2 0 0 -1 -1 0 -1
N -1 -1 -2 0 0 -2 -2 -2 4 0 0
D -3 -3 -1 -1 -1 -3 -2 -2 5 1 -1
C -4 -4 -4 -3 -1 -5 -5 -2 -2 -2 -2
Q -1 -4 0 0 0 -1 0 -3 0 4 -1
E -2 -3 0 0 -1 -1 -1 -2 0 5 -1
G -1 -3 -2 1 -2 -1 -2 -3 0 -2 -1
H 1 -3 -1 -1 -2 -4 0 -4 0 -1 -1
I 1 1 -1 -2 -1 -1 0 4 -2 -3 0
L 3 2 -3 -2 0 0 0 2 -2 -2 0
K 0 -1 0 0 0 0 -1 -2 0 1 0
M 6 0 -3 -1 0 1 0 1 -2 -2 0
F 0 8 -4 -1 -1 1 3 1 -2 -3 -1
P -3 -4 10 -2 0 -4 -3 -3 -1 0 -1
S -1 -1 -2 4 2 -2 -1 -1 0 0 0
T 0 -1 0 2 5 -2 -2 1 -1 -1 0
W 1 1 -4 -2 -2 16 3 -2 -3 -1 -1
Y 0 3 -3 -1 -2 3 8 0 -2 -1 -1
V 1 1 -3 -1 1 -2 0 5 -2 -2 0
B -2 -2 -1 0 -1 -3 -2 -2 5 0 -1
Z -2 -3 0 0 -1 -1 -1 -2 0 4 0
X 0 -1 -1 0 0 -1 -1 0 -1 0 -1
BLOSUM 45 Matrix
A R N D C Q E G H I L K
A 5 -2 -1 -2 -1 -1 -1 0 -2 -1 -1 -1
R -2 7 0 -1 -3 1 0 -2 0 -3 -2 3
N -1 0 6 2 -2 0 0 0 1 -2 -3 0
D -2 -1 2 7 -3 0 2 -1 0 -4 -3 0
C -1 -3 -2 -3 12 -3 -3 -3 -3 -3 -2 -3
Q -1 1 0 0 -3 6 2 -2 1 -2 -2 1
E -1 0 0 2 -3 2 6 -2 0 -3 -2 1
G 0 -2 0 -1 -3 -2 -2 7 -2 -4 -3 -2
H -2 0 1 0 -3 1 0 -2 10 -3 -2 -1
I -1 -3 -2 -4 -3 -2 -3 -4 -3 5 2 -3
L -1 -2 -3 -3 -2 -2 -2 -3 -2 2 5 -3
K -1 3 0 0 -3 1 1 -2 -1 -3 -3 5
M -1 -1 -2 -3 -2 0 -2 -2 0 2 2 -1
F -2 -2 -2 -4 -2 -4 -3 -3 -2 0 1 -3
P -1 -2 -2 -1 -4 -1 0 -2 -2 -2 -3 -1
S 1 -1 1 0 -1 0 0 0 -1 -2 -3 -1
T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1
W -2 -2 -4 -4 -5 -2 -3 -2 -3 -2 -2 -2
Y -2 -1 -2 -2 -3 -1 -2 -3 2 0 0 -1
V 0 -2 -3 -3 -1 -3 -3 -3 -3 3 1 -2
B -1 -1 4 5 -2 0 1 -1 0 -3 -3 0
Z -1 0 0 1 -3 4 4 -2 0 -3 -2 1
X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1
M F P S T W Y V B Z X
A -1 -2 -1 1 0 -2 -2 0 -1 -1 0
R -1 -2 -2 -1 -1 -2 -1 -2 -1 0 -1
N -2 -2 -2 1 0 -4 -2 -3 4 0 -1
D -3 -4 -1 0 -1 -4 -2 -3 5 1 -1
C -2 -2 -4 -1 -1 -5 -3 -1 -2 -3 -2
Q 0 -4 -1 0 -1 -2 -1 -3 0 4 -1
E -2 -3 0 0 -1 -3 -2 -3 1 4 -1
G -2 -3 -2 0 -2 -2 -3 -3 -1 -2 -1
H 0 -2 -2 -1 -2 -3 2 -3 0 0 -1
I 2 0 -2 -2 -1 -2 0 3 -3 -3 -1
L 2 1 -3 -3 -1 -2 0 1 -3 -2 -1
K -1 -3 -1 -1 -1 -2 -1 -2 0 1 -1
M 6 0 -2 -2 -1 -2 0 1 -2 -1 -1
F 0 8 -3 -2 -1 1 3 0 -3 -3 -1
P -2 -3 9 -1 -1 -3 -3 -3 -2 -1 -1
S -2 -2 -1 4 2 -4 -2 -1 0 0 0
T -1 -1 -1 2 5 -3 -1 0 0 -1 0
W -2 1 -3 -4 -3 15 3 -3 -4 -2 -2
Y 0 3 -3 -2 -1 3 8 -1 -2 -2 -1
V 1 0 -3 -1 0 -3 -1 5 -3 -3 -1
B -2 -3 -2 0 0 -4 -2 -3 4 0 -1
Z -1 -3 -1 0 -1 -2 -2 -3 0 4 -1
X -1 -1 -1 0 0 -2 -1 -1 -1 -1 -1
BLOSUM 62 Matrix
A R N D C Q E G H I L K
A 6 -2 -2 -3 -1 -1 -1 0 -2 -2 -2 -1
R -2 8 -1 -2 -5 1 0 -3 0 -4 -3 3
N -2 -1 8 2 -4 0 0 -1 1 -5 -5 0
D -3 -2 2 9 -5 0 2 -2 -2 -5 -5 -1
C -1 -5 -4 -5 13 -4 -5 -4 -4 -2 -2 -5
Q -1 1 0 0 -4 8 3 -3 1 -4 -3 2
E -1 0 0 2 -5 3 7 -3 0 -5 -4 1
G 0 -3 -1 -2 -4 -3 -3 8 -3 -6 -5 -2
H -2 0 1 -2 -4 1 0 -3 11 -5 -4 -1
I -2 -4 -5 -5 -2 -4 -5 -6 -5 6 2 -4
L -2 -3 -5 -5 -2 -3 -4 -5 -4 2 6 -4
K -1 3 0 -1 -5 2 1 -2 -1 -4 -4 7
M -1 -2 -3 -5 -2 -1 -3 -4 -2 2 3 -2
F -3 -4 -4 -5 -4 -5 -5 -5 -2 0 1 -5
P -1 -3 -3 -2 -4 -2 -2 -3 -3 -4 -4 -2
S 2 -1 1 0 -1 0 0 0 -1 -4 -4 0
T 0 -2 0 -2 -1 -1 -1 -2 -3 -1 -2 -1
W -4 -4 -6 -6 -3 -3 -4 -4 -4 -4 -2 -4
Y -3 -3 -3 -5 -4 -2 -3 -5 3 -2 -2 -3
V 0 -4 -4 -5 -1 -3 -4 -5 -5 4 1 -3
B -2 -2 5 6 -5 0 1 -1 -1 -5 -5 -1
Z -1 0 0 1 -5 5 6 -3 0 -5 -4 1
X -1 -2 -2 -2 -3 -1 -1 -2 -2 -2 -2 -1
M F P S T W Y V B Z X
A -1 -3 -1 2 0 -4 -3 0 -2 -1 -1
R -2 -4 -3 -1 -2 -4 -3 -4 -2 0 -2
N -3 -4 -3 1 0 -6 -3 -4 5 0 -2
D -5 -5 -2 0 -2 -6 -5 -5 6 1 -2
C -2 -4 -4 -1 -1 -3 -4 -1 -5 -5 -3
Q -1 -5 -2 0 -1 -3 -2 -3 0 5 -1
E -3 -5 -2 0 -1 -4 -3 -4 1 6 -1
G -4 -5 -3 0 -2 -4 -5 -5 -1 -3 -2
H -2 -2 -3 -1 -3 -4 3 -5 -1 0 -2
I 2 0 -4 -4 -1 -4 -2 4 -5 -5 -2
L 3 1 -4 -4 -2 -2 -2 1 -5 -4 -2
K -2 -5 -2 0 -1 -4 -3 -3 -1 1 -1
M 8 0 -4 -2 -1 -2 -1 1 -4 -2 -1
F 0 9 -5 -4 -3 1 4 -1 -5 -5 -2
P -4 -5 11 -1 -2 -5 -4 -4 -3 -2 -2
S -2 -4 -1 6 2 -4 -3 -2 0 0 -1
T -1 -3 -2 2 7 -4 -2 0 -1 -1 -1
W -2 1 -5 -4 -4 16 3 -4 -6 -4 -3
Y -1 4 -4 -3 -2 3 10 -2 -4 -3 -2
V 1 -1 -4 -2 0 -4 -2 6 -5 -4 -1
B -4 -5 -3 0 -1 -6 -4 -5 5 0 -2
Z -2 -5 -2 0 -1 -4 -3 -4 0 5 -1
X -1 -2 -2 -1 -1 -3 -2 -1 -2 -1 -2
BLOSUM 80 Matrix
A R N D C Q E G H I L K
A 7 -3 -3 -3 -1 -2 -2 0 -3 -3 -3 -1
R -3 9 -1 -3 -6 1 -1 -4 0 -5 -4 3
N -3 -1 9 2 -5 0 -1 -1 1 -6 -6 0
D -3 -3 2 10 -7 -1 2 -3 -2 -7 -7 -2
C -1 -6 -5 -7 13 -5 -7 -6 -7 -2 -3 -6
Q -2 1 0 -1 -5 9 3 -4 1 -5 -4 2
E -2 -1 -1 2 -7 3 8 -4 0 -6 -6 1
G 0 -4 -1 -3 -6 -4 -4 9 -4 -7 -7 -3
H -3 0 1 -2 -7 1 0 -4 12 -6 -5 -1
I -3 -5 -6 -7 -2 -5 -6 -7 -6 7 2 -5
L -3 -4 -6 -7 -3 -4 -6 -7 -5 2 6 -4
K -1 3 0 -2 -6 2 1 -3 -1 -5 -4 8
M -2 -3 -4 -6 -3 -1 -4 -5 -4 2 3 -3
F -4 -5 -6 -6 -4 -5 -6 -6 -2 -1 0 -5
P -1 -3 -4 -3 -6 -3 -2 -5 -4 -5 -5 -2
S 2 -2 1 -1 -2 -1 -1 -1 -2 -4 -4 -1
T 0 -2 0 -2 -2 -1 -2 -3 -3 -2 -3 -1
W -5 -5 -7 -8 -5 -4 -6 -6 -4 -5 -4 -6
Y -4 -4 -4 -6 -5 -3 -5 -6 3 -3 -2 -4
V -1 -4 -5 -6 -2 -4 -4 -6 -5 4 1 -4
B -3 -2 5 6 -6 -1 1 -2 -1 -6 -7 -1
Z -2 0 -1 1 -7 5 6 -4 0 -6 -5 1
X -1 -2 -2 -3 -4 -2 -2 -3 -2 -2 -2 -2
M F P S T W Y V B Z X
A -2 -4 -1 2 0 -5 -4 -1 -3 -2 -1
R -3 -5 -3 -2 -2 -5 -4 -4 -2 0 -2
N -4 -6 -4 1 0 -7 -4 -5 5 -1 -2
D -6 -6 -3 -1 -2 -8 -6 -6 6 1 -3
C -3 -4 -6 -2 -2 -5 -5 -2 -6 -7 -4
Q -1 -5 -3 -1 -1 -4 -3 -4 -1 5 -2
E -4 -6 -2 -1 -2 -6 -5 -4 1 6 -2
G -5 -6 -5 -1 -3 -6 -6 -6 -2 -4 -3
H -4 -2 -4 -2 -3 -4 3 -5 -1 0 -2
I 2 -1 -5 -4 -2 -5 -3 4 -6 -6 -2
L 3 0 -5 -4 -3 -4 -2 1 -7 -5 -2
K -3 -5 -2 -1 -1 -6 -4 -4 -1 1 -2
M 9 0 -4 -3 -1 -3 -3 1 -5 -3 -2
F 0 10 -6 -4 -4 0 4 -2 -6 -6 -3
P -4 -6 12 -2 -3 -7 -6 -4 -4 -2 -3
S -3 -4 -2 7 2 -6 -3 -3 0 -1 -1
T -1 -4 -3 2 8 -5 -3 0 -1 -2 -1
W -3 0 -7 -6 -5 16 3 -5 -8 -5 -5
Y -3 4 -6 -3 -3 3 11 -3 -5 -4 -3
V 1 -2 -4 -3 0 -5 -3 7 -6 -4 -2
B -5 -6 -4 0 -1 -8 -5 -6 6 0 -3
Z -3 -6 -2 -1 -2 -5 -4 -4 0 6 -1
X -2 -3 -3 -1 -1 -5 -3 -2 -3 -1 -2
BLOSUM 100 Matrix
A R N D C Q E G H I L K
A 8 -3 -4 -5 -2 -2 -3 -1 -4 -4 -4 -2
R -3 10 -2 -5 -8 0 -2 -6 -1 -7 -6 3
N -4 -2 11 1 -5 -1 -2 -2 0 -7 -7 -1
D -5 -5 1 10 -8 -2 2 -4 -3 -8 -8 -3
C -2 -8 -5 -8 14 -7 -9 -7 -8 -3 -5 -8
Q -2 0 -1 -2 -7 11 2 -5 1 -6 -5 2
E -3 -2 -2 2 -9 2 10 -6 -2 -7 -7 0
G -1 -6 -2 -4 -7 -5 -6 9 -6 -9 -8 -5
H -4 -1 0 -3 -8 1 -2 -6 13 -7 -6 -3
I -4 -7 -7 -8 -3 -6 -7 -9 -7 8 2 -6
L -4 -6 -7 -8 -5 -5 -7 -8 -6 2 8 -6
K -2 3 -1 -3 -8 2 0 -5 -3 -6 -6 10
M -3 -4 -5 -8 -4 -2 -5 -7 -5 1 3 -4
F -5 -6 -7 -8 -4 -6 -8 -8 -4 -2 0 -6
P -2 -5 -5 -5 -8 -4 -4 -6 -5 -7 -7 -3
S 1 -3 0 -2 -3 -2 -2 -2 -3 -5 -6 -2
T -1 -3 -1 -4 -3 -3 -3 -5 -4 -3 -4 -3
W -6 -7 -8 -10 -7 -5 -8 -7 -5 -6 -5 -8
Y -5 -5 -5 -7 -6 -4 -7 -8 1 -4 -4 -5
V -2 -6 -7 -8 -3 -5 -5 -8 -7 4 0 -5
B -4 -4 5 6 -7 -2 0 -3 -2 -8 -8 -2
Z -2 -1 -2 0 -8 5 7 -5 -1 -7 -6 0
X -2 -3 -3 -4 -5 -2 -3 -4 -4 -3 -3 -3
M F P S T W Y V B Z X
A -3 -5 -2 1 -1 -6 -5 -2 -4 -2 -2
R -4 -6 -5 -3 -3 -7 -5 -6 -4 -1 -3
N -5 -7 -5 0 -1 -8 -5 -7 5 -2 -3
D -8 -8 -5 -2 -4 -10 -7 -8 6 0 -4
C -4 -4 -8 -3 -3 -7 -6 -3 -7 -8 -5
Q -2 -6 -4 -2 -3 -5 -4 -5 -2 5 -2
E -5 -8 -4 -2 -3 -8 -7 -5 0 7 -3
G -7 -8 -6 -2 -5 -7 -8 -8 -3 -5 -4
H -5 -4 -5 -3 -4 -5 1 -7 -2 -1 -4
I 1 -2 -7 -5 -3 -6 -4 4 -8 -7 -3
L 3 0 -7 -6 -4 -5 -4 0 -8 -6 -3
K -4 -6 -3 -2 -3 -8 -5 -5 -2 0 -3
M 12 -1 -5 -4 -2 -4 -5 0 -7 -4 -3
F -1 11 -7 -5 -5 0 4 -3 -7 -7 -4
P -5 -7 12 -3 -4 -8 -7 -6 -5 -4 -4
S -4 -5 -3 9 2 -7 -5 -4 -1 -2 -2
T -2 -5 -4 2 9 -7 -5 -1 -2 -3 -2
W -4 0 -8 -7 -7 17 2 -5 -9 -7 -6
Y -5 4 -7 -5 -5 2 12 -5 -6 -6 -4
V 0 -3 -6 -4 -1 -5 -5 8 -7 -5 -3
B -7 -7 -5 -1 -2 -9 -6 -7 6 0 -4
Z -4 -7 -4 -2 -3 -7 -6 -5 0 6 -2
X -3 -4 -4 -2 -2 -6 -4 -3 -4 -2 -3
DNA PAM 20 Matrix
A C G T X
A 9 -11 -11 -11 -11
C -11 9 -11 -11 -11
G -11 -11 9 -11 -11
T -11 -11 -11 9 -11
X -11 -11 -11 -11 9
DNA PAM 20 Transition/Transversion Matrix
A C G T X
A 13 -21 -10 -21 -21
C -21 13 -21 -10 -21
G -10 -21 13 -21 -21
T -21 -10 -21 13 -21
X -21 -21 -21 -21 13
DNA PAM 30 Matrix
A C G T X
A 9 -9 -9 -9 -9
C -9 9 -9 -9 -9
G -9 -9 9 -9 -9
T -9 -9 -9 9 -9
X -9 -9 -9 -9 9
DNA PAM 47 Matrix
A C G T X
A 5 -4 -4 -4 -4
C -4 5 -4 -4 -4
G -4 -4 5 -4 -4
T -4 -4 -4 5 -4
X -4 -4 -4 -4 5
DNA PAM 50 Matrix
A C G T X
A 9 -7 -7 -7 -7
C -7 9 -7 -7 -7
G -7 -7 9 -7 -7
T -7 -7 -7 9 -7
X -7 -7 -7 -7 9
DNA PAM 50 Transition/Transversion Matrix
A C G T X
A 11 -13 -3 -13 -13
C -13 11 -13 -3 -13
G -3 -13 11 -13 -13
T -13 -3 -13 11 -13
X -13 -13 -13 -13 11
DNA PAM 65 Matrix
A C G T X
A 3 -2 -2 -2 -2
C -2 3 -2 -2 -2
G -2 -2 3 -2 -2
T -2 -2 -2 3 -2
X -2 -2 -2 -2 3
DNA PAM 85 Matrix
A C G T X
A 7 -4 -4 -4 -4
C -4 7 -4 -4 -4
G -4 -4 7 -4 -4
T -4 -4 -4 7 -4
X -4 -4 -4 -4 7
DNA PAM 85 Transition/Transversion Matrix
A C G T X
A 1 -1 0 -1 -1
C -1 1 -1 0 -1
G 0 -1 1 -1 -1
T -1 0 -1 1 -1
X -1 -1 -1 -1 1
DNA PAM 110 Matrix
A C G T X
A 2 -1 -1 -1 -1
C -1 2 -1 -1 -1
G -1 -1 2 -1 -1
T -1 -1 -1 2 -1
X -1 -1 -1 -1 2
Structure-Genetic Scoring Matrix
A R N D C Q E G H I L K
A 0 3 -1 -1 2 2 -1 -1 3 0 0 -1
R 0 0 2 -2 0 0 0 1 0 3 -2 0
N 0 1 3 1 2 -1 0 1 2 2 0 -1
D 1 1 2 0 -1 -1 0 -1 2 1 1 1
C 0 1 1 0 0 2 -1 0 0 0 2 1
Q 1 2 -1 0 2 -1 -1 1 0 0 -1 0
E 2 0 -1 -2 0 -2 1 1 1 2 -1 2
G 0 -1 1 0 -1 1 3 0 0 1 1 -2
H 0 2 -2 1 1 1 0 -1 1 3 2 0
I 3 -1 -1 0 1 1 1 1 3 -1 0 0
L 0 -1 0 2 1 -1 0 -1 0 0 1 0
K -1 0 1 0 -1 -1 2 2 -1 1 0 0
M 1 3 0 1 0 2 2 1 1 0 0 0
F 1 2 -2 1 0 2 1 1 0 0 0 -1
P 1 -2 0 0 2 3 1 0 0 1 0 -1
S 0 1 1 0 3 1 0 0 0 3 -1 -1
T 2 0 3 -2 1 0 0 0 0 2 -2 1
W 0 3 3 0 0 0 0 0 -1 1 0 -1
Y 0 1 1 0 0 3 2 0 -1 -2 0 -2
V 1 1 0 0 2 1 2 -1 0 2 -1 -1
B 1 0 0 1 0 1 1 0 0 2 -1 0
Z 0 0 0 1 1 2 0 -1 -1 0 -1 2
X 0 2 0 1 3 1 2 -1 0 1 2 2
M F P S T W Y V B Z X
A 0 2 3 -1 1 1 1 2 1 0 0
R 1 2 0 0 1 1 1 1 0 0 0
N -1 0 1 1 -2 -1 1 0 0 0 0
D 1 0 0 1 -1 0 0 0 1 0 0
C 3 1 2 1 0 0 0 2 1 0 0
Q 2 0 1 1 0 0 2 1 1 0 0
E -1 1 1 0 0 3 0 1 1 0 0
G -2 1 0 0 0 0 3 3 0 0 0
H 0 0 0 0 2 0 3 -2 1 0 0
I 0 0 3 0 1 1 0 3 1 0 0
L 0 3 1 -2 0 0 2 3 1 0 0
K 3 1 2 -2 1 0 2 1 1 0 0
M 1 3 0 1 0 2 2 1 1 0 0
F 0 1 0 -1 -1 2 2 -1 1 0 0
P 0 2 1 -1 0 -1 0 0 1 0 0
S 0 1 1 1 1 3 -1 0 0 0 0
T 1 1 0 -1 1 3 2 0 0 0 0
W 1 3 0 0 1 1 -2 -2 1 0 0
Y 1 1 1 2 -1 2 -1 1 1 0 0
V 1 0 0 -1 0 2 0 1 1 0 0
B 0 0 2 1 3 1 2 1 0 0 0
Z 1 1 1 1 0 0 1 -1 0 0 0
X 0 -1 -1 0 1 1 -2 -1 1 0 0
Properties Scoring Matrix
A R N D C Q E G H I L K
A 1 2 0 1 2 -2 -2 -1 4 0 1 3
R 1 2 1 -1 0 0 1 1 0 3 0 1
N -1 3 3 0 0 -1 1 3 3 2 -2 0
D 2 1 -1 -1 1 -1 -1 0 2 0 -1 2
C 2 1 -1 2 1 1 2 1 0 -1 1 2
Q -2 -1 -2 2 1 0 1 -3 -1 1 1 2
E 2 0 -2 0 2 -1 1 -2 1 1 1 2
G -1 0 0 3 1 -1 2 0 1 1 -1 0
H -1 0 -1 2 1 0 2 2 1 3 0 1
I 3 1 -2 2 2 0 0 2 3 -1 1 2
L 0 0 0 1 2 0 0 -2 -1 0 1 0
K -1 2 1 3 2 0 2 -1 0 1 0 0
M -1 3 1 1 2 -2 0 0 1 0 0 1
F 0 3 -1 1 0 2 -1 0 0 0 2 -1
P 0 1 -1 3 2 4 1 0 0 -1 0 0
S 0 1 -1 1 4 0 0 0 1 3 1 -2
T 0 1 4 0 2 0 0 1 -1 0 -1 2
W -2 4 2 1 0 0 -2 -1 0 0 3 1
Y 1 2 0 0 0 4 2 0 -2 0 2 -1
V 2 2 0 0 -2 -2 -1 -2 2 1 0 1
B 0 0 0 0 2 1 -1 2 1 1 2 1
Z 0 0 3 2 1 -1 -1 1 -1 -1 0 2
X 0 0 -1 3 3 0 0 -1 1 3 3 2
M F P S T W Y V B Z X
A 0 1 2 0 4 0 0 2 1 0 0
R -1 1 0 0 0 0 -1 1 0 0 0
N 1 -1 0 2 0 1 1 0 0 0 0
D 3 -1 0 2 1 1 0 0 1 0 0
C 3 1 2 1 1 0 0 -1 0 0 0
Q 1 1 1 2 0 0 1 2 2 0 0
E -1 0 1 0 0 2 1 2 0 0 0
G 0 0 0 0 0 -2 4 2 1 0 0
H 1 0 0 0 0 1 4 0 2 0 0
I 0 0 2 0 1 -1 1 4 0 0 0
L 0 2 0 1 -1 3 2 4 1 0 0
K 1 0 3 -1 1 0 2 -1 0 0 0
M -1 3 1 1 2 -2 0 0 1 0 0
F 2 1 3 2 0 2 -1 0 1 0 0
P 0 1 2 0 0 -2 -1 0 1 0 0
S 2 2 0 0 2 3 -1 1 2 0 0
T 1 0 2 2 1 3 0 1 1 0 0
W -1 2 0 1 1 -1 0 0 0 0 0
Y 1 -2 1 1 1 2 -1 0 1 0 0
V -3 -1 1 1 2 1 1 1 2 0 0
B 0 -1 1 2 3 1 2 1 1 0 0
Z 0 -1 2 3 -1 0 2 1 1 0 0
X -2 0 1 -1 0 2 0 1 1 0 0
1991 Pairwise Exchange Table (PET) at 250 PAM
A R N D C Q E G H I L K
A 2 -1 0 0 -1 -1 -1 1 -2 0 -1 -1
R -1 5 0 -1 -1 2 0 0 2 -3 -3 4
N 0 0 3 2 -1 0 1 0 1 -2 -3 1
D 0 -1 2 5 -3 1 4 1 0 -3 -4 0
C -1 -1 -1 -3 11 -3 -4 -1 0 -2 -3 -3
Q -1 2 0 1 -3 5 2 -1 2 -3 -2 2
E -1 0 1 4 -4 2 5 0 0 -3 -4 1
G 1 0 0 1 -1 -1 0 5 -2 -3 -4 -1
H -2 2 1 0 0 2 0 -2 6 -3 -2 1
I 0 -3 -2 -3 -2 -3 -3 -3 -3 4 2 -3
L -1 -3 -3 -4 -3 -2 -4 -4 -2 2 5 -3
K -1 4 1 0 -3 2 1 -1 1 -3 -3 5
M -1 -2 -2 -3 -2 -2 -3 -3 -2 3 3 -2
F -3 -4 -3 -5 0 -4 -5 -5 0 0 2 -5
P 1 -1 -1 -2 -2 0 -2 -1 0 -2 0 -2
S 1 -1 1 0 1 -1 -1 1 -1 -1 -2 -1
T 2 -1 1 -1 -1 -1 -1 -1 -1 1 -1 -1
W -4 0 -5 -5 1 -3 -5 -2 -3 -4 -2 -3
Y -3 -2 -1 -2 2 -2 -4 -4 4 -2 -1 -3
V 1 -3 -2 -2 -2 -3 -2 -2 -3 4 2 -3
B 0 -1 3 4 -2 1 3 1 1 -3 -4 1
Z -1 1 1 3 -2 4 4 -1 1 -3 -3 2
X -1 -1 -1 -1 -1 -1 -1 -1 0 -1 -1 -1
M F P S T W Y V B Z X
A -1 -3 1 1 2 -4 -3 1 0 -1 -1
R -2 -4 -1 -1 -1 0 -2 -3 -1 1 -1
N -2 -3 -1 1 1 -5 -1 -2 3 1 -1
D -3 -5 -2 0 -1 -5 -2 -2 4 3 -1
C -2 0 -2 1 -1 1 2 -2 -2 -4 -1
Q -2 -4 0 -1 -1 -3 -2 -3 1 4 -1
E -3 -5 -2 -1 -1 -5 -4 -2 3 4 -1
G -3 -5 -1 1 -1 -2 -4 -2 1 -1 -1
H -2 0 0 -1 -1 -3 4 -3 1 1 0
I 3 0 -2 -1 1 -4 -2 4 -3 -3 -1
L 3 2 0 -2 -1 -2 -1 2 -4 -3 -1
K -2 -5 -2 -1 -1 -3 -3 -3 1 2 -1
M 6 0 -2 -1 0 -3 -2 2 -3 -3 -1
F 0 8 -3 -2 -2 -1 5 0 -4 -5 -1
P -2 -3 6 1 1 -4 -3 -1 -2 -1 -1
S -1 -2 1 2 1 -3 -1 -1 1 -1 -1
T 0 -2 1 1 2 -4 -3 0 0 -1 -1
W -3 -1 -4 -3 -4 15 0 -3 -5 -4 -2
Y -2 5 -3 -1 -3 0 9 -3 -2 -3 -1
V 2 0 -1 -1 0 -3 -3 4 -2 -3 -1
B -3 -4 -2 1 0 -5 -2 -2 3 3 -1
Z -3 -5 -1 -1 -1 -4 -3 -3 3 3 -1
X -1 -1 -1 -1 -1 -2 -1 -1 -1 -1 -1
Gonnet Mutation Matrix
A 24 -6 -3 -3 5 -2 0 5 -8 -8 -12 -4
R -6 47 3 -3 -22 15 4 -10 6 -24 -22 27
N -3 3 38 22 -18 7 9 4 12 -28 -30 8
D -3 -3 22 47 -32 9 27 1 4 -38 -40 5
C 5 -22 -18 -32 115 -24 -30 -20 -13 -11 -15 -28
Q -2 15 7 9 -24 27 17 -10 12 -19 -16 15
E 0 4 9 27 -30 17 36 -8 4 -27 -28 12
G 5 -10 4 1 -20 -10 -8 66 -14 -45 -44 -11
H -8 6 12 4 -13 12 4 -14 60 -22 -19 6
I -8 -24 -28 -38 -11 -19 -27 -45 -22 40 28 -21
L -12 -22 -30 -40 -15 -16 -28 -44 -19 28 40 -21
K -4 27 8 5 -28 15 12 -11 6 -21 -21 32
M -7 -17 -22 -30 -9 -10 -20 -35 -13 25 28 -14
F -23 -32 -31 -45 -8 -26 -39 -52 -1 10 20 -33
P 3 -9 -9 -7 -31 -2 -5 -16 -11 -26 -23 -6
S 11 -2 9 5 1 2 2 4 -2 -18 -21 1
T 6 -2 5 0 -5 0 -1 -11 -3 -6 -13 1
W -36 -16 -36 -52 -10 -27 -43 -40 -8 -18 -7 -35
Y -22 -18 -14 -28 -5 -17 -27 -40 22 -7 0 -21
V 1 -20 -22 -29 0 -15 -19 -33 -20 31 18 -17
B -3 0 30 35 -25 8 18 3 8 -33 -35 7
Z -1 10 8 18 -27 22 27 -9 8 -23 -22 14
X -4 -5 -5 -9 -8 -3 -7 -15 0 -9 -9 -5
M F P S T W Y V B Z X
A -7 -23 3 11 6 -36 -22 1 -3 -1 -4
R -17 -32 -9 -2 -2 -16 -18 -20 0 10 -5
N -22 -31 -9 9 5 -36 -14 -22 30 8 -5
D -30 -45 -7 5 0 -52 -28 -29 35 18 -9
C -9 -8 -31 1 -5 -10 -5 0 -25 -27 -8
Q -10 -26 -2 2 0 -27 -17 -15 8 22 -3
E -20 -39 -5 2 -1 -43 -27 -19 18 27 -7
G -35 -52 -16 4 -11 -40 -40 -33 3 -9 -15
H -13 -1 -11 -2 -3 -8 22 -20 8 8 0
I 25 10 -26 -18 -6 -18 -7 31 -33 -23 -9
L 28 20 -23 -21 -13 -7 0 18 -35 -22 -9
K -14 -33 -6 1 1 -35 -21 -17 7 14 -5
M 43 16 -24 -14 -6 -10 -2 16 -26 -15 -5
F 16 7 -38 -28 -22 36 51 1 -38 -33 -12
P -24 -38 76 4 1 -50 -31 -18 -8 -4 -11
S -14 -28 4 22 15 -33 -19 -10 7 2 -4
T -6 -22 1 15 25 -35 -19 0 3 -1 -4
W -10 36 -50 -33 -35 142 41 -26 -44 -35 -13
Y -2 51 -31 -19 -19 41 78 -11 -21 -22 -4
V 16 1 -18 -10 0 -26 -11 34 -26 -17 -7
B -26 -38 -8 7 3 -44 -21 -26 32 57 -14
Z -15 -33 -4 2 -1 -35 -22 -17 57 24 -10
X -5 -12 -11 -4 -4 -13 -4 -7 -14 -10 -7
Nucleotides
NAME CODE MEANING
Adenine A A
Cytosine C C
Guanine G G
Thymine/Uracil T/U T/U
M A or C
R A or G
W A or T/U
S C or G
Y C or T/U
K G or T/U
V A or C or G
H A or C or T/U
D A or G or T/U
B C or G or T/U
X/N A or C or G or T/U
Amino Acids
NAME 3 LETTER CODE
Alanine Ala A
Cysteine Cys C
Aspartic Acid Asp D
Glutamic Acid Glu E
Phenylalanine Phe F
Glycine Gly G
Histidine His H
Isoleucine Ile I
Lysine Lys K
Leucine Leu L
Methionine Met M
Asparagine Asn N
Proline Pro P
Glutamine Gln Q
Arginine Arg R
Serine Ser S
Threonine Thr T
Valine Val V
Tryptophan Trp W
Tyrosine Tyr Y
Aspartic/Asparagine Asp,Asn B
Glutamic/Glutamine Glu,Gln Z
Unknown Xxx X
Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing
an NIH Supported Resource Center
300 S. Craig St. Pittsburgh, PA 15213. Phone: 412-268-4960, Email: biomed@psc.edu
Please send suggestions for improving this code and error reports to biomed@psc.edu.
biomed-www@psc.edu (last updated: 2/17/98)
©
1998 Pittsburgh Supercomputing Center