The Protein Family Tree
How can the flood of human genomic data
be gainfully employed? The first step is using sequence data
to determine 3-D protein structure. But how can scientists
wade efficiently through this massive information to solve
hundreds of thousands of proteins that remain unsolved? A
series of software tools, some of them developed at PSC,
make it possible to search databases and classify proteins
in family groups that reflect the evolutionary relationships
that select for protein function.
PSC and University of Pittsburgh
scientists have exhaustively analyzed the relationships
among a family of enzymes called aldehyde dehydrogenase.
Found in nearly every living thing, ALDH in mammals protects
the body from toxic compounds. In this graphical
representation of the ALDH molecular structure (top, right),
colors represent amino-acid groups that are "highly
conserved" - they remain essentially the same in nearly all
species of the enzyme.
The logic of evolution holds that
conserved residues are important in structure and function,
and PSC scientists have developed algorithms to use
conserved residues to identify sequence elements in related
proteins that can predict crucial elements of 3-D structure
and function. Glutathione S-Transferase is a very large
enzyme family that protects cells from chemical toxicity.
Using sophisticated sequence-alignment techniques, PSC
researchers have classified GSTs into six subfamilies. This
graphical representation of GST structure (bottom, right),
identifies molecular features that distinguish the
subfamilies and predict the specificity of their biological