ALL Flesh is Grass?
Blacklight Helps Researchers Untangle Genome of Wheat Progenitor
Wheat, a species of grass, provides more protein for human consumption—more flesh—than any other plant. Globally, we harvest 725 million metric tons of wheat every year. It’s a fundamental human food source, found in many common items including breads, pastas and cereals. But wheat is a complicated plant, a hybrid of several progenitor species.
WHY IT'S IMPORTANT
Bread wheat has six sets of chromosomes, compared with humans’ two, and far more repetition in its genome—the collection of genes and other DNA that directs an organism’s biology. Engineering improved disease resistance, drought tolerance or increased yields into wheat is difficult partly because the size and repetition of wheat DNA make it difficult to create an accurate, complete genome for the grain.
“Assembling a genome is like putting together a jigsaw puzzle. With goatgrass, it was like putting together a jigsaw puzzle with lots of blue sky and clouds.”—Michael Schatz, Cold Spring Harbor Laboratory
When a species’ DNA sequence has multiple copies of nearly identical genes (top, red boxes), there’s a danger that the computer will confuse two of them as the same gene (bottom, two red boxes stacked), and the final DNA sequence assembly will be missing one of those genes.
HOW BLACKLIGHT AND XSEDE HELPED
Michael Schatz and colleagues at Cold Spring Harbor Laboratory, New York, have studied wheat in part by determining the DNA sequence for Ae. tauschii, or goatgrass. This grass is one of the species our ancestors bred to create modern wheat. To determine the DNA sequence of a genome, researchers must first split it into short DNA fragments because current DNA sequencers cannot read more than a few hundred nucleotides at once from the many billions in the genome. Then they match overlaps in these short DNA sequences to put them together into their original order. But species like goatgrass and wheat with lots of sequence repetition can “fool” the matching process causing genes to escape detection.
XSEDE Extended Collaborative Support Service and Novel and Innovative Projects Program staff at PSC helped Schatz modify the gold-standard ALLPATHS-LG genome assembly software to handle big, repetitive genomes. Thanks to an XSEDE alloction on PSC’s Blacklight supercomputer and its huge shared memory, the software could now assess many millions of potential ways of assembling small DNA fragments at once, without traveling back and forth to data storage. This greatly speeded the computation. Schatz’s work on Blacklight detected at least 230 genes that had been missed in earlier attempts to assemble the goatgrass genome.