IOTest
Last updated: June 2005 13-jun-2005 Eagle/UK14, scratcha1, scratcha2, 16 servers, 2-way stripe 08-jun-2005 Eagle/UK1, scratcha1, scratcha2, 16 servers, 2-way stripe 23-oct-2004 Eagle, local, single node, 3-way stripe 11-aug-2004 Eagle, local, single process, 3-way stripe 24-apr-2004 Eagle, scratcha2, 16 servers, 1-way stripe 06-apr-2004 Eagle, scratch1, scratchb1, scratcha2 (2-way stripe), and local 19-mar-2004 Eagle, scratcha2, 16 servers, 16-way stripe 16-mar-2004 Eagle, scratchb1 and scratchb2, kernel patches, new cables, reboots 03-mar-2004 Eagle, new $SCRATCH configurations, 2-way stripe all scratch fs 19-jan-2004 Kite/UK1, all scratch fs including final scratchB1 and scratchB2 configurations 14-jan-2004 Kite/UK1, all scratch fs including original scratchB configurations Introduction IOTest is a flexible I/O benchmarking tool used to test the performance of parallel filesystems. It can be configured to write, read, or write and read an arbitrary number of 64-bit values with single or multiple write/read operations. Open, write/read, and close operations are timed. Minimums and maximums are reported. All times are in units of seconds. Bandwidth is estimated by dividing the total bytes transferred by total time. Minimum and maximum rates are reported. All rates are in million of bytes per second. Note that the total times used to compute bandwidth are NOT the sum of minimum or maximum open + write/read + close operations, but rather the minimum or maximum total times (i.e. a separate timer is used.) An iteration count is supported. It allows the same test to be performed multiple times. Iterations are affected internally. A single set of timer outputs for the accumulated time and data size is generated. An option to run sweeps is also provided. Testing always begins with the node and process counts specified for the job. If the sweep option is enabled, then after saving the original process per node (ppn) count, the current node count is divided by 2, the new process count is set to the product of the new node count and ppn, and the next test is initiated. This pattern is repeated until the node count reaches 0. Sweeps are affected externally by the job script. Therefore, each sweep generates a set of timer outputs. Arguments to the IOTest program are as follows: NR number of 64-bit values per write NC number of writes NITER number of iterations IREAD 0 for write, 1 for read, 2 for write then read A job submission script is also provided. It supports the sweeps feature. Job configurations are noted for each test, e.g. queue, filesystem, walltime, sweeps, node count, and process count. scratch Details The initial configuration of the scratchB filesystems was tested between 12-jan-2004 and 14-jan-2004. The scratchB1 and scratchB3 filesystems were each using 8 servers of cluster B; scratchB2 was based on 16 servers.The servers underlying these scratch spaces were overlapped, i.e. the eight servers associated with scratchB1 and the eight servers of scratchB3 comprised the 16 servers of scratchB2. The block size for scratchB1 was 1 MB, while scratchB2 and scratchB3 were configured with a block size of 2 MB. A second configuration was tested 19-jan-2004. The scratchB1 and scratchB3 filesystems had been merged. The new scratchB1 and scratchB2 filesystems were now both based on 16 servers (all of B cluster) with no overlap. The block size of both systems was reduced to 256 KB (previously 1 MB and 2 MB respectively.) The effect of the change in B2 block size yielded marginal improvements in every category, leading to the conclusion that the change to a smaller block size was beneficial for the test case. More noticeable improvements were evident for B1, but as the server count had doubled, this was expected. Write performance increased by about 50% and read performance approximately doubled. With the Eagle upgrade, the server count for both $SCRATCH file systems (scratch1 and scratch2) was increased from 4 to 8. As a result, performance improved across the board by something like a factor of two. All of the scratch file systems (scratch1, scratch2, scratchb1, and scratchb2) were configured to use 2-way striping (stripe dimensions had been equal to the server count.) All of the scratch file systems were recabled between March 10 and 17. A couple of kernel patches were also applied. No other configuration changes were made. Therefore, the performance problems observed for scratchb1 and scratchb2 in early March were solved by some combination of the patches, reboots, and recabling work. Comments Since many programs wait until all processes have completed an I/O operation, data corresponding to the maximum times (minimum rates) might be regarded as more significant. Buffering effects are apparent in many cases. Variation between minimum and maximum times can be quite large. Time required to open and close files at scale is surprising. For the scratchB filesystems, worst case open+close times for 2048 files approached 25% of the total test time in one case. Cleanups of scratch space are also expensive at scale. Time required to remove 2048 files from the scratchB2 and scratchB3 filesystems exceeded the time remaining after the tests had completed (about 13 minutes.) Informal observations by Systems Support staff for a case involving millions of files and a set of 4 "rm" processes operating in parallel indicated a sustainable rate of about 3 deletions per second. Files remaining upon expiration of the scratchB jobs suggest that around 1750 files were removed in about 13 minutes, which translates to about 2.25 deletions per second. At this rate, the aforementioned cleanups would require about 15 minutes to completely remove all 2048 test files. Needless to say, the cleanup step will be eliminated from future tests at scale. For the noted file size (200 MB per process), the best performance achieved by a scratch filesystem was about 1.5 GB/s for reads (scratch7) and writes (scratchB2.) The decline in recent scratch7 measurements was likely caused by the fact that scratch7 was operating at near capacity at the time these tests were performed.