Zest

Overview

The Zest file system is a patented, highly scalable parallel file system designed for maximum efficiency with write-intensive application workloads such as checkpointing.

The name “Zest” was chosen due to Zest’s nature of preferring writing to the outer most cylinders of its storage disks.

View a graph of disk write bandwidth vs. cylinder location

The following techniques are used to maximize scalability:

Parity for data protection in case of disk failure is calculated directly at the compute resource generating the data.
A non-deterministic data placement strategy is used in I/O server selection based upon each server’s eagerness to receive data.
Another non-deterministic data placement strategy is used after an I/O server has been selected. Disks to write client data are chosen based upon each disk’s eagerness to receive data, so long as two members of a parity group would not end up on the same disk.

Acknowledgements/Publications

SC07 Most innovative HPC storage technology Readers’ Choice Award Recipient
SC07 Storage Challenge “Zest: The Maximum Reliable TBytes/sec/$ for Petascale Systems” [PDF] slides [PDF]
Petascale Data Storage Workshop ’08 “Zest: Checkpoint Storage System for Large Supercomputers” [PDF] slides [PDF]

Deployments

Zest has been outfitted to work with Blacklight, PSC’s shared memory SGI UV system.

Future Work

As Zest offers no direct read(2) support, a metadata server would be required to obviate the third party file system (such as Lustre) that is currently required to stage where I/O can be accessed in a POSIX read(2) fashion after being processed by Zest.

The MDS would track which chunks of data were resident on which I/O servers as a result of the non-deterministic data placement strategy that Zest uses to maximize efficiency.

Similar/Influenced Work

PLFS

Contact Information

The PSC Advanced Systems group can be reached at advsys@psc.edu.