The Zest file system is a patented, highly scalable parallel file system designed for maximum efficiency with write-intensive application workloads such as checkpointing.
The name “Zest” was chosen due to Zest’s nature of preferring writing to the outer most cylinders of its storage disks.
The following techniques are used to maximize scalability:
- Parity for data protection in case of disk failure is calculated directly at the compute resource generating the data.
- A non-deterministic data placement strategy is used in I/O server selection based upon each server’s eagerness to receive data.
- Another non-deterministic data placement strategy is used after an I/O server has been selected. Disks to write client data are chosen based upon each disk’s eagerness to receive data, so long as two members of a parity group would not end up on the same disk.
- SC07 Most innovative HPC storage technology Readers’ Choice Award Recipient
- SC07 Storage Challenge "Zest: The Maximum Reliable TBytes/sec/$ for Petascale Systems" [PDF] slides [PDF]
- Petascale Data Storage Workshop '08 "Zest: Checkpoint Storage System for Large Supercomputers" [PDF] slides [PDF]
Zest has been outfitted to work with the following PSC machines:
- Blacklight, a shared memory SGI UV system
- BigBen, a Cray XT3
- Pople, an SGI Altix shared memory NUMA system
As Zest offers no direct read(2) support, a metadata server would be required to obviate the third party file system (such as Lustre) that is currently required to stage where I/O can be accessed in a POSIX read(2) fashion after being processed by Zest.
The MDS would track which chunks of data were resident on which I/O servers as a result of the non-deterministic data placement strategy that Zest uses to maximize efficiency.