Structure of Proteins and DNA Protein in Profiles
A new approach to sketching the outlines of protein structure.

You could say that hormones are part of the food chain. Think of it this way — to live, we have to digest food. To digest food, we need enyzmes. To release digestive enzymes from the pancreas, we need hormones, which go from the small intestine to the pancreas and say, in biochemical language: "Food's here, time for work." One of the hormones that delivers this wakeup call is cholecystokinin, better known to biologists as CCK.

In recent work, Grace Rosenquist of the University of California, Davis and Hugh Nicholas of Pittsburgh Supercomputing Center took a novel approach to answering some questions about CCK, with surprising results. CCK is a peptide hormone, a chain of amino-acids, therefore a protein. Until fairly recently, biologists thought peptide hormones were a special brand of protein; unlike other proteins, it was believed, their ability to do what they do, delivering messages and coordinating activity among different kinds of cells, doesn't depend on having a discernible shape.

For most proteins, function follows from form. Hemoglobin, for instance, has intricate twists and folds that create a compartment for iron molecules, and its ability to transport iron in the blood depends on its folded shape. Peptide hormones, however, lack the helices, sheets, turns and combinations of these 3-D features that structural biologists call secondary and tertiary structure. They have primary structure — a linear sequence of amino acids — and that's about it, or so structural biologists thought. "The feeling has been that these peptide chains just flop around randomly in the bloodstream," says Rosenquist.

The work of Rosenquist and Nicholas suggests that for CCK there's more to it than that. They looked at CCK using profile analysis, a standard method for analyzing a protein's sequence of amino-acids, but they applied it in a radically new way. Their results, produced with number-crunching help from the CRAY C90 and software developed at Pittsburgh Supercomputing Center, offer credible evidence that a sequence of human CCK, CCK-58, has structure — two closely packed helices.

This finding offers a viable explanation for unexpected CCK activity observed in experiments. "CCK-58 shows binding activity different from other forms of CCK," says Rosenquist, "and that made us think there's a structural difference." Beyond what this means for CCK, their research method itself turns over a new leaf for structural biologists. It suggests that profile analysis could cut the time and expense involved in determining the structure of proteins, grinding work that is one of the important ongoing tasks of molecular biology.

Hints at Structure

CCK hormones help digest fat, carbohydrates and proteins. When these foods are in our small intestine, cells in the intestinal wall release CCK into the bloodstream, and the hormone travels to the pancreas, where it stimulates secretion of digestive enzymes. CCK also goes to the gall bladder, causing it to contract and put bile into the small intestine. CCK is also present at relatively high concentrations in the brain, where — researchers speculate — it plays a role in appetite control.

Laboratory work has revealed three main active forms of CCK that differ by the length of their amino-acid chain: CCK-8, CCK-33 and CCK-58. All three have the same eight amino-acid sequence at one end. This part of the peptide is its binding site, the part that latches on to receptors in the pancreas, for instance, to initiate enzyme release.

Because all three have the same active site, the normal expectation would be that the larger forms, which on average take longer to travel place to place, would be less biologically active. Experiments injecting CCK into rats have shown, however, that CCK-58 is more potent than the shorter forms at releasing certain enzymes.

"The change in activity doesn't correlate with how large the peptide is," says Nicholas. "For some reason, in certain circumstances CCK-58 is less active than you'd expect, and in other circumstances it's more active. This indicates there's probably some structure, either enhancing or inhibiting its interaction."

It's one thing to surmise that a protein has structure, another to figure out with certainty what that structure is. Traditional structure determination methods — x-ray crystallography and, more recently, NMR — involve a painstaking iterative process that takes months and sometimes years to solve the riddle of a single protein. Rosenquist and Nicholas knew that few, if any, structural biologists would be willing to invest the time to determine the structure of CCK-58 without more convincing evidence.

Protein Carpentry

In 1989, when Rosenquist attended a sequence-analysis workshop in Pittsburgh led by Nicholas, she didn't expect it would lead to a new way to survey the territory of unmapped protein structure. "I learned some things about profile analysis," says Rosenquist, "but I never thought I'd have much occasion to use it."

The PSC workshop was the genesis of a productive collaboration, joining Rosenquist's knowledge of hormone biology with Nicholas's expertise in computational methods. Rosenquist had published extensively on CCK and knew the experimental work on this peptide family. Conversations with Nicholas led to a working hypothesis of helical structure in CCK-58 along with a plan to test the hypothesis using sequence-analysis methods.

Profile-SS Scores Showing Location of Possible Helices
Download larger version (386KB) of this image. Download MPEG animation (1.25MB).
This graphic, from a PSC educational video, illustrates helix-helix packing, showing a helix-turn-helix motif in the protein lysozyme. Sidechains (green) of one helix (blue) define a ridge that fits into the groove defined by the sidechains (yellow) of the other helix (red). Other sidechains (cyan & orange) participate in the helix-helix contact but not in the ridge-groove interaction.

"Traditional sequence-analysis tools gave us faint hints that a couple regions were at least possibly helical," says Nicholas. Statistical studies of known protein structures provide data on what amino acids are likely indicators of helices. Helices near each other along the same stretch of peptide often link by folding together like interlaced fingers. This "helix-turn-helix" pattern is one example of how protein structure reflects the distinct chemical properties associated with the "side-chains" of each amino acid.

Some amino-acids, explains Nicholas, have side-chains that like to avoid water. Often, water-avoiding (hydrophobic) side-chains of one helix pack together in the core of the helical interface with like-minded side-chains from the other, creating a shelter from the cell's watery environment. "CCK-58 showed these kinds of signals," says Nicholas, "the right periodicity and frequency of amino-acids, consistent with packed helical structure."

Diamond Patch
Download larger version (149KB) of this image.
The Diamond Patch
Three graphical representations of the diamond-shaped pattern of hydrophobic amino-acid sidechains for one helix of a packed helical structure (helix four of the protein thermolysin). The diamond shows most clearly in the alignment of beta-carbon atoms (middle), the atoms closest to the helix. This pattern represents a +4 helix, so called due to the number of amino acids between those forming two parallel ridges (relative positions 0 & 4 and 3 & 7). The space-filling representation (left) shows the full conformation of the sidechains, which form a groove between the parallel ridge surfaces. The bond representation (right) shows internal orientation of sidechain structure.

To go beyond these hints, Rosenquist and Nicholas turned to research by Cambridge (England) structural biologist Cyrus Chothia, who in the 1970s and early 80s deduced a set of rules describing how two helices pack together. Working with a set of eight packed-helices (from five proteins), Chothia showed that hydrophobic side-chains on each helix form a diamond-shaped pattern that aligns the side-chains in ridges separated by parallel grooves. The ridges of one helix fit into the grooves of the other, forming a tight joint analogous to tongue-and-groove carpentry.

Profiles of Structure

How does CCK-58 compare with the eight packed-helices of Chothia's study? Do the amino-acid sequences as a group give any indication of structural relationship?

Profile analysis is a powerful method often used to analyze differences and similarities in amino-acid sequence among proteins related to each other by evolution. Such methods reveal, for instance, that the muscle protein myoglobin and the iron-carrying protein hemoglobin have the same evolutionary lineage. Similarly, relationships between species can be inferred from their protein sequences; this kind of evidence tells us, for instance, that humans diverged from African apes five million years ago.

Nicholas and Rosenquist decided to try a novel approach, applying profile analysis to look for structural relationships among proteins that have no evolutionary relationship. The first step was to align and analyze the helical sequences of the eight known packed-helices. "If there's enough selection for specific amino-acids in this packing geometry," says Nicholas, "then profile analysis ought to work. It will indicate that particular amino-acids are favored, and we could use those statistics to predict the likelihood of packing in other sequences."

The preliminary analysis showed promise, to say the least. The researchers built sequence-based profiles from Chothia's helix-pairs and used them — applying a classic strategy called "leave-one-out-analysis" — to see if profiles from seven of the helix-pairs could predict helical structure in the remaining sequence. The answer was yes. The sequence-based profiles correctly picked the first side-chain in the "packing diamond" of both helices for all eight helix-pairs.

But the predictions were even better than this. Chothia's work showed that helices choose their partners carefully; one pattern of hydrophobic side-chains (called +3 helices) always packs with another (+4 helices). The sequence-based profiles put together by Nicholas and Rosenquist picked all the +3 helices as +3s and, likewise, all the +4s as +4s. "Considering the size of our sample, only eight helix-pairs," says Rosenquist, "we were surprised these profiles could be this selective. This means there's a definite pattern of hydrophobic, hydrophilic amino-acids in these helices."

Profile-SS Scores Showing Location of Possible Helices
Download larger version (72KB) of this image.
Profile-SS scores every position in the CCK sequence, using the likelihoods established by sequence-based profiles from Chothia's set of eight packed-helix pairs along with other methods. The score at each position reflects the likelihood of helical structure in that region. Using data from all the profiles, the analysis selects the 13th amino-acid (leucine) and the 31st (arginine) as the first position of two "diamond patch" packing regions. "These two regions [arrows] show a distinct possibility of helical structure," says Nicholas. "One of the things we find encouraging is the periodic peaks separated by three or four positions, which is the periodicity of the helix. One of the predictions you would make is that after a good fit you would move up one turn of the helix and see another good fit, and then it drops off on both sides. That's what we think we're seeing here."

Encouraged by this high degree of selectivity, the researchers turned to a profile-analysis program, Profile-SS, developed by Nicholas and colleagues at PSC, to apply the constructed helical profiles to CCK-58. "This is the mathematical part of the analysis," says Nicholas. "You have to project how likely it is that other amino-acids can appear at a particular position based on the sample we start with."

The resulting scores, developed by several different methods, predict two regions of CCK-58 with a high likelihood of helical structure. The highest scores occur at positions three to four amino-acids apart, consistent with one turn of a helix. "We're seeing successive regions that fit well with the diamond pattern," says Nicholas. "Our hypothesis is that these two regions curl back on each other, and that may provide a core for CCK-58 to collapse around, rather than just flopping around in solution, which could explain its anomalous behavior."

Rough Sketches of Protein Structure

The researchers see their work as making a credible case for packed-helical structure in CCK-58, but its persuasiveness is inherently limited by the small size of the sample, only eight helix-pairs. Rosenquist is working with a team of students to identify other proteins of known structure that have the same ridge-and-groove packed-helices. "It's a slow process," says Rosenquist, "because we have to look carefully at each helix pair." The plan is to create a reliable dataset of 50 to 60 structures. So far they have identified about 30 with assurance.

If the enlarged dataset confirms the early results, Rosenquist and Nicholas will have established a strong basis for structural biologists to go forward with a full structural determination of CCK-58. "Instead of looking at all possible interactions in all possible configurations," says Nicholas, "profile analysis could bring this down to a short list of elements that interact with each other in a specific geometry. It becomes a much more tractable problem."

The real promise of their work, the researchers recognize, goes beyond CCK-58. If profile analysis can sketch the rough outlines of structure, as their work suggests, it could save countless years of work on undetermined proteins. "The exciting prospect," says Rosenquist, "is that you can take just a primary sequence and derive information about structure. You can predict what part of the protein is going to be not just helical, but also packed, and packed in a certain way. And if it works for helix-packing, then we can look at other kinds of structure. This kind of knowledge saves time, and time is money."

Researchers: Grace Rosenquist, University of California-Davis, Hugh Nicholas, PSC.
Hardware: CRAY C90
Related Material on the Web:
Section of Neurobiology, Physiology and Behavior at UC-Davis
Projects in Scientific Computing , PSC's annual research report.
The Biomedical Supercomputing Initiative at PSC
References, Acknowledgements & Credits
© Pittsburgh Supercomputing Center (PSC)
Revised: June 30, 1998