Pittsburgh Supercomputing Center Scientists Patent Software for Protecting Supercomputing Results Against System Failures

PITTSBURGH, April 8, 2013 — Scientists at Pittsburgh Supercomputing Center (PSC) have patented ZEST, a piece of software that takes a rapid “snapshot” of a supercomputer’s calculations as it works. ZEST greatly speeds the ability to store complex calculations as a hedge against a system failure, saving precious supercomputing time and slowing calculations down far less than current methods.

PSC co-inventors of ZEST included Paul Nowoczynski, Jason Sommerfield, Nathan Stone, and Jared Yanovich.

Just as we all hit “save” as we work, scientists carrying out vast computations such as those required for detailed weather predictions or earthquake science need to periodically store — “checkpoint” — the machine’s computational state. In the case of a system malfunction, this allows them to avoid having to start from the beginning after hours or days of work.

The problem, according to J. Ray Scott, Director of Systems and Operations at PSC, is that retrieving and storing these data takes time away from calculation, which is carefully rationed to researchers using highly in-demand supercomputers. In fact, he adds, over the last seven years the memory available in the largest machines has increased about 25-fold, while the capacity for retrieving that memory has increased only about four-fold.

“If you have a large job, checkpointing the run often means writing out tens of terabytes of data” — enough to fill about a thousand new iPads, Scott says. “This takes a nontrivial amount of time. The whole time you’re doing the checkpoint, you’re not using the computer.”

The ZEST software works by tightly managing the supercomputer’s disk drives, continuously routing checkpoint storage to disks that aren’t being used for computation.

“Every disk drive holds up its hand and says, ‘I can take these data now,’” Scott explains. This “pull-based model” ensures the checkpointing conflicts as little as possible with the computer’s own use of the drives. “You’re always writing to whomever’s the most available.”

ZEST is far more efficient than current methods, which “push” data to disks whether or not they’re ready to receive it. ZEST has demonstrated 90 percent of the theoretical maximum efficiency of writing data to drives; currently available commercial systems have efficiencies of 25 percent or less.

About PSC: Pittsburgh Supercomputing Center is a joint effort of Carnegie Mellon University and the University of Pittsburgh together with Westinghouse Electric Company. Established in 1986, PSC is supported by several federal agencies, the Commonwealth of Pennsylvania and private industry, and is a partner in the National Science Foundation XSEDE program.

# # # 

System Status

Bridges status. Read more.

Notifications on the status of PSC's systems, services, and HPC resources.

Featured Projects

All about Bridges

Bridges 4c stacked

Allocation requests now being accepted.

 

Data Exacell (DXC)

DXC

The Data Exacell (DXC) is a research pilot project to create, deploy, and test software and hardware building blocks to enable data analytics in scientific research. Learn More

PSC at 30 Years 

30yearlogo long

Thirty years have gone by since June 1986 when PSC opened its doors. View Timeline

XSEDE Service Provider

image gallery

PSC is a service provide of the Extreme Science and Engineering Discovery Environment (XSEDE), the most advanced, powerful, and robust collection of integrated advanced digital resources and services in the world.