The Protein Family Tree

How can the flood of human genomic data be gainfully employed? The first step is using sequence data to determine 3-D protein structure. But how can scientists wade efficiently through this massive information to solve hundreds of thousands of proteins that remain unsolved? A series of software tools, some of them developed at PSC, make it possible to search databases and classify proteins in family groups that reflect the evolutionary relationships that select for protein function.

PSC and University of Pittsburgh scientists have exhaustively analyzed the relationships among a family of enzymes called aldehyde dehydrogenase. Found in nearly every living thing, ALDH in mammals protects the body from toxic compounds. In this graphical representation of the ALDH molecular structure (top, right), colors represent amino-acid groups that are "highly conserved" - they remain essentially the same in nearly all species of the enzyme.

The logic of evolution holds that conserved residues are important in structure and function, and PSC scientists have developed algorithms to use conserved residues to identify sequence elements in related proteins that can predict crucial elements of 3-D structure and function. Glutathione S-Transferase is a very large enzyme family that protects cells from chemical toxicity. Using sophisticated sequence-alignment techniques, PSC researchers have classified GSTs into six subfamilies. This graphical representation of GST structure (bottom, right), identifies molecular features that distinguish the subfamilies and predict the specificity of their biological function.

Close this window.