PROSITE

A dictionary of protein sites and patterns

PROSITE is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites and patterns formulated in such a way that with appropriate computational tools it can rapidly and reliably identify to which known family of protein (if any) the new sequence belongs.

In some cases the sequence of an unknown protein is too distantly related to any protein of known structure to detect its resemblance by overall sequence alignment, but it can be identified by the occurrence in its sequence of a particular cluster of residue types which is variously known as a pattern, motif, signature, or fingerprint. These motifs arise because of particular requirements on the structure of specific region(s) of a protein which may be important, for example, for their binding properties or for their enzymatic activity. These requirements impose very tight constraints on the evolution of those limited (in size) but important portion(s) of a protein sequence.

There are a number of protein families as well as functional or structural domains that cannot be detected using patterns due to their extreme sequence divergence; the use of techniques based on weight matrices (also known as profiles) allows the detection of such proteins or domains.

Installed on: the Opteron cluster.

Usage:

The EMBOSS Software can be used to access PROSITE data.

The native PROSITE data can be made availiable upon user request. To request the data, contact PSC User Services.

See also: