Ben
Ben has been decommissioned.
See the migration document for guidance on migrating from ben to bigben, and read the bigben document for additional information on computing on bigben.
Contact PSC User Services with any questions.
Additional Information
Storing Files
File repositories
File repositories are file storage spaces which are not directly connected to a frontend or compute node. You cannot, for example, open a file that resides in a file repository from inside a program. You must use explicit file copy commands to move files to and from the repository. You currently have one file repository available to you on ben: golem.psc.edu.
Golem is a SGI Origin 300. It uses a combination tape-and-disk archival system.
Transfer files to and from golem interactively before and after your batch job runs, so your job does not tie up compute nodes while performing file transfers.
If you need to store a file to golem that is 2 Tbytes or larger please first contact User Services so that special arrangements can be made to store your file.
Files can be stored and retrieved from golem using one of these methods: tcscp, far, kftp, gridftp, scp or sftp.
tcscp
The tcscp program can be used to transfer files between golem and your ben home directory or $SCRATCH. To transfer files between golem and $LOCAL on your compute nodes you must use either $SCRATCH or $HOME as intermediate locations.
far
The far program can be used to transfer files between golem and your ben home directory or $SCRATCH. To transfer files between golem and $LOCAL on your compute nodes you must use either $SCRATCH or $HOME as intermediate locations.
We recommend using tcscp rather than far for transferring files between $SCRATCH or $HOME and golem. If you prefer the far interface you can indicate that you want far to actually use tcscp for file transfers by placing the line
set transport = "tcscp"
in the file .far.conf in your home directory.
kftp, gridftp, scp, sftp
You can use kftp, gridftp, scp or sftp to transfer files between golem and your remote machine. We recommend against using sftp. See the golem Web page for more information.
Transferring Files
You can use kftp or scp to copy files between ben and a remote machine. The far program transfers files from ben to golem, the archival system.
Both kftp and scp are secure file transfer methods. But while kftp encrypts the authentication information only, scp encrypts both the authentication information and the data being transferred. For this reason, kftp is generally faster than scp.
Kftp
Ben is running both a Kerberos 5 (K5) client and server. Thus, if your local machine has K5 client/server software installed, you can use kftp to transfer files to and from ben whether you are logged into ben or your local machine. The examples below assume that you are logged into your local machine.
Before you can use kftp to transfer files, you must authenticate yourself to ben. To do this use the kinit command.
kinit username@PSC.EDU
For 'username' substitute your PSC userid. PSC.EDU is PSC's Kerberos realm name.
After you enter this command you are prompted for your PSC Kerberos password, which is the password you use to login to ben.
Once you are authenticated you can use the kftp command to actually perform your file transfers.
kftp ben.psc.edu
The kftp command functions like the ftp command.
To transfer files into or from $SCRATCH you will need to specify the full pathname without using any variables.
You should verify that the Kerberos commands operate on your local system as described here. Some installations of Kerberized ftp differ in their implementation.
Man pages for kinit and kftp are available on ben.
A Unix kftp client is available at http://www.pdc.kth.se/heimdal. A Windows kftp client is available at http://web.mit.edu/network/kerberos-form.html.
Scp, discussed below, and sftp can also be used for ben file transfers, but kftp will generally be much faster. We recommend against using sftp.
Scp
The secure copy program, scp, can be used to transfer files between your remote machine and your ben home directory or your $SCRATCH directory. You cannot use scp to copy files between $LOCAL and your remote machine. To get remote files to $LOCAL you must go through your ben home directory or $SCRATCH.
The format for the scp command is
scp source-filename target-filenamewhere the filename on the remote system, whether it is the target or the source, must be specified as
username@system:filename
For example, to copy a file to your $SCRATCH directory on ben when you are logged in to your remote system use a command such as
scp file username@ben.psc.edu:/usr/scratch/n/username/fileIf you are logged in to ben and you want to copy over a file from your home system to $SCRATCH, use a command such as
scp username@remote-system:file /usr/scratch/n/username/file
The first time you use scp to or from ben, you will receive a message similar to
Host key not found from list of known hosts. Are you sure you want to continue connecting?
Answer 'yes' to make the connection. You should not receive this message on subsequent connections.
You will be prompted next for your password on the remote system.
You may be able to improve the scp transfer rate by choosing the blowfish encryption method rather than using the default. To do this, type:
scp -c blowfish file username@remote-system:file
For more information on the scp command, see the scp man page.
Scp is part of the ssh distribution. PSC provides a list of sites that distribute ssh.
We strongly recommend that you use kftp rather than scp if kftp is available.
Far
You can use the far program to move files between your ben home directory or your $SCRATCH directory and golem, PSC's file archiver.
Tar
Whether you are transferring files between ben and golem or between ben and your remote system, if you have many files--1000 or more--it is much more efficient to tar them up into one tar file and then transfer this single tar file, especially if they are small files, 64 Kbytes or smaller.
Tru64 tar--located at /bin/tar--can only create tar files up to 8 Gbytes. Gnu tar--located at /usr/psc/gnu/bin/tar--can create files larger than 8 Gbytes. However, a file created by Gnu tar that is larger than 8 Gbytes cannot be read by Tru64 tar.
You should first contact User Services if you are going to create a tar file that is 50 Gbytes or larger. You should move your tar file to golem or to yoru remote system as soon as you can after you create it.
System drain
System drains occur when a test time is scheduled for ben. Check the ben status page to see if a test time is scheduled for ben.
Comments on system queues
Send email to remarks@psc.edu if you have complaints about your job turnaround or if you need special scheduling considerations to meet a project deadline.
Batch access
Qsub
Sample job script
Submitting your script for execution
Other qsub options
Executing a script with prun
Multiple simultaneous parallel executions in a single job
Chaining Jobs
File I/O
Tcscp and golem
Tcscp can be also be used to copy files between $HOME or $SCRATCH and golem. In fact, we we recommend that you use tcscp rather than far for these types of file transfer. For example, the command
tcscp golem:input.dat $SCRATCH/input.dat
will copy input.dat from your golem home directory to $SCRATCH. You should copy your files between golem and $HOME or $SCRATCH before or after you run your batch jobs.
If you prefer the far interface to the tcscp interface you can indicate that you want far to actually use tcscp for its file transfers by placing the line
set transport = "tcscp"
in the file .far.conf in your home directory.
Interactive Access With the Qsub -I Option
The -I (capital "i") option to qsub can be used to provide a form of interactive access on ben.
Interactive jobs in the debug queue
Use the -I option to qsub to run a job in the debug queue in addition to the -q debug option. A sample command for running in the debug queue is
qsub -I -q debug -l rmsnodes=2:8 -l walltime=30:00
This command requests 8 processors on 2 nodes for thirty minutes. See the discussion above of the debug queue for information on debug queue limits.
After you enter this command the system will respond
qsub: waiting for job 349.ben.psc.edu to start
If there are free nodes available in the debug queue the system will respond
qsub: job 349.ben.psc.edu ready pbsmom: Successful (0) in mom_set_limits, rmsnodes = 10/2.8 pbsmom: Successful (0) in mom_set_limits, rmspartition = production
This means that your compute nodes have been allocated to you. If more than 5 minutes elapse with no reponse, there are probably not enough nodes free in the debug queue to run your job. Your qsub will wait until it is satisfied. Type ^C to end your qsub request if you don't want to wait, and try your debug job at a later time. You can use the qstat command to see how many jobs are running in the debug queue and how many nodes they are using.
Once nodes are allocated to you, you will receive a command prompt. You can now enter commands and the stdout and stderr from those commands will be directed to your screen. You should enter commands as if they were in a batch script.
However, even though you are running interactively you must use a prun command to run your executable on your compute nodes.
prun -N ${RMS_NODES} -n ${RMS_PROCS} ./myrunscript
Use input redirection to get stdin input to your prun executable.
If you just enter the name of your executable without a prun your executable will not run on your compute nodes. Executables should only be run on compute nodes. Thus, you should never enter just the name of your executable at the command prompt.
Once you are finished with your interactive job type ^D to end your job.
^D qsub: job 349.ben completed
When you use qsub -I you are charged for the total time you hold your compute nodes whether you compute or not. Thus, as soon as you are done with your prun commands you should type ^D.
If you have difficulties getting compute nodes to run your interactive debug jobs please send email to remarks@psc.edu.
Interactive jobs in the batch queue
You may want to run an interactive job that is too large for the debug queue. In these cases use the -I option to qsub but not the -q debug option.
In this situation your interactive job will run in the batch queue. It is subject to the queue limits and policies of the batch queue and must compete with other jobs in the batch queue for compute nodes.
Thus, it can be difficult for an interactive job in the batch queue to get scheduled. Two favorable times to try are during the system drain if you can adjust the your walltime request so that your job finishes before the system drain ends or if your interactive job is large enough, right after a system drain.
If you wait more than 5 minutes after your qsub -I command and have not received a system prompt there are probably not enough free nodes to allocate to your job. Your qsub request will wait until it is satisfied. You should type ^C to end your qsub command if you do not want to wait.
The nodestat and rinfo commands can be used to tell the number of free nodes before you submit your qsub request. If you choose to wait you can use the -M and -m options to qsub to send email when your qsub request has been met and your job has started.
Once you have your compute nodes you can enter commands interactively, just as was discussed above with interactive jobs in the debug queue. You will enter your commands as if they were in a batch script, except that the stdout and stderr will be directed to your screen.
Since your interactive jobs in the batch queue will probably be longer than your interactive debug queue jobs, it is even more imperative that you type ^D when you are finished processing. Otherwise you will be charged for the total time you hold your nodes.
If have difficulties getting access to compute nodes to run your interactive jobs in the batch queue please send email to remarks@psc.edu.
Monitoring and Killing Jobs
Qstat
The qstat command is used to display the status of the PBS queue. It includes running and queued jobs.
The -f and -a options to qstat provide you with more extensive status listings. If your job is in a special state, the comment field, which is visible when you use the -f option, will often display important information about the state of your job. The -Q and -q options show the status of the batch queues and the value of many queue limits. The -f option when used with -Q for a specific queue displays additional information about queue limits.
See man qstat for more details.
Nodestat
The nodestat command gives the number of nodes currently available for scheduling.
Backlog
The backlog command shows the number of jobs running, queued and held, and the total number of node-hours for the jobs in each category. It also sums the jobs in these three states, listing an estimate of the total backlog on ben in machine days.
Qest
The qest command gives a projection of how long a job will wait to begin execution. It uses historical data for jobs of a similiar size and wallclock time to estimate the wait in the queue. Usage is:
qest <cpus> <walltime>
where cpus is the number of processors requested and walltime is the time requested in hours.
Please note that this is an estimate only. The mix of jobs in the queue at any given time is unique and the ultimate determinant of when a job is executed.
Rinfo
The rinfo command displays node usage and availability info. The default output contains four sections: information on the machine, partitions, resource requests and active jobs.
The machine section identifies the machine name and active configuration. The partition section shows the number of CPUs in use, the total number of CPUs, partition status, time the partition has been active, time limits on jobs, and the node names for each partition in the active configuration.
The resource portion indicates the resources allocated to users, number of CPUs in that resource, status, how long the resource has been held, the user and the nodes included in the resource.
The jobs section shows the number of CPUs, status, how long the job has been running, user and nodes used for each job. It includes both running and queued jobs.
There are options which change the information displayed by rinfo. See the man page for details.
Qdel
The qdel command is used to kill queued and running jobs.
Qalter
The qalter command is used to alter queue options for queued and running jobs.
Pmm
Issued from a login node, the pmm command will display the status of all the nodes in the system. Type pmm -help for a fuller description.
Scientific and Mathematical Packages
If you would like a third-party package to be installed on ben please send your request to remarks@psc.edu.
Programming Tools
Pixie
Pixie is a tool which can be used to profile single processor performance of a code. See the man page for pixie for more information.
Hiprof
Hiprof is a tool which can be used to profile single processor performance of a code. See the man page for hiprof for more information.
Atom
Atom is a suite of tools which can be used to profile single processor performance of a code, including generating Mflop rates. It can also be used to measure scaling performance for parallel codes. See the man page for atom for more information.
Debugging Tips
Debugging strategy
Your first few runs should be on a small version of your problem. Your first run should not be for your largest problem size. It is easier to find code problems if you are using fewer nodes. This strategy should be followed even if you are porting a working code from another system.
Also, you should use the debug queue for your debugging runs. Never run a debugging run on ben's front end. You should always run a ben program using qsub and prun.
Compiler options
Several compiler options can be useful to you when you are debugging your code. If you use the -g option when compiling, the error messages the system provides when your code fails will probably be more informative. For example, you will probably be given the source code line number of the source code statement that caused the failure. Once you have a production version of your code do not use the -g option. If you do your code will run slower.
The -check_bounds compiler option will cause your code to report if it exceeds an array bounds while it is running. The -warn argument_checking option will cause your running program to report any mismatched procedure arguments.
The -g and -check_bounds options are available in Fortran and C, but the -warn argument_checking option is only available in Fortran.
Core files
Core files may be useful to you when debugging your code. Both TotalView and ladebug can be used to work with core files. In addition, if you uncover a system bug we may need the core file from your execution.
The default behavior of the system is to delete any core files when your job ends. To make sure your core file is saved after your job completes you must set the environment variable RMS_KEEP_CORE to 1 in your batch script before you run your executable. If you then provide us with the PBS jobid of your failed job we will be able to retrieve the core file.
Variable initialization
Alpha processors do not automatically initialize variables, unlike many other processors. This is a common cause of program failures when you port a code to ben from other platforms where they worked because the processor was automatically initializing your variables.
By default the HP Fortran compiler will warn you at compile time if you use an uninitialized variable. However, the Fortran compiler does not catch every case where this happens and for those cases it does catch it just issues a warning. It does not initialize the variable.
The -trapuv option to the C compiler causes all uninitialized stack variables
to get a value that will cause your program to fail if they are used without
another value being placed in them. Stack variables include local automatic
variables and function arguments. Thus, the -trapuv option can be used to
detect certain types of uninitialzed variables in C.
Exception handling
The HP compilers may handle exceptions in a manner different from the other platforms on which you may have worked or from which you are porting a code. For example, if you use the -check underflow option to the f90 compiler you will receive a runtime warning if there is an integer underflow. Otherwise your code will run through an integer underflow. Similarly, if you use the -check overflow option to f90 a floating point overflow will cause your code to fail. Otherwise your code will run through a floating point overflow. On the other hand, codes that generate "inexact results," such as NaN or Infinity, on other platforms will cause a floating point exception on ben. You can mimic the former type of behavior on ben if you use the -fpe3 option to f90.
The HP C compiler, by default, will run through all exceptions, except integer and floating point underflow and cases where results are affected by roundoff. If you want your C code to run through all exceptions you can use the -speculate all option to cc.
Little Endian versus Big Endian
The data bytes in a binary floating point number or a binary integer can be stored in a different order on different machines. Ben is a Little Endian machine, which means that the low-order byte of a number is stored in memory with the lowest address for that number while the high-order byte is stored in the highest address for that number. The data bytes are stored in the reverse order on a Big Endian machine.
If your machine has Tcl installed you can tell whether the machine is Little Endian or Big Endian by issuing the command
echo 'puts $tcl_platform(byteOrder)' | tclsh
Most Little Endian machines, including ben, have facilities for reading and writing Big Endian files. The reverse is often not the case for Big Endian machines.
If you are using Fortran on ben there are three methods you can use to read and write Big Endian files. First, if you do not want to change your source code or do not have access to the source code, you can set the environment variable FORT_CONVERTXX to 'BIG_ENDIAN' for each Fortran unit number that you want to use to read or write in Big Endian format. For 'XX' you substitute the unit number. For example, the command
setenv FORT_CONVERT15 BIG_ENDIAN
will cause unit 15 to process files in Big Endian format.
Second, if you want to have all your Fortran unit numbers to process data in Big Endian format you can use the -convert big_endian option to f90. Other values for -convert, such as cray or ibm, can be used to specify more precisly how you want your binary data to be handled. See the f90 man page for more information about the -convert option.
Finally, you can add the parameter
convert='big-endian'
to the open statements for those files you want to read or write as Big Endian.
There are no equivalent facilities available for C.
Optimization
Mflops
To improve the performance of your ben code you should begin with single-processor performance. To measure your single-processor performance you should compute the the Mflops rate for your code.
To compute your code's Mflops rate follow these steps.
-
Compile your code.
f90 -fast -o yourprog yourprog.f90 -lmpi -lelan
-
Generate a list of the routines in your code.
nm -g yourprog | perl -e'while (<STDIN>) {@fields = split(" ", $_); \ if ($fields[4] eq "T") {print "$fields[0]\n";}}' > routines - Edit file routines to put the name of your main program first. With a C program or a Fortran program without a program statement the main program will be called 'main'. If a Fortran program has a program statement the main program name will be the name in the program statement with an '_' appended to it. You can also delete from file routines any routines for which you do not want to determine an Mflops rate.
-
Load the atom tools module.
module load atom
-
Instrument your code so that it will generate timings and operation counts.
cat routines | atom -tool timer5 yourprog cat routines | atom -tool flop2 yourprog
The first command will generate an executable named yourprog.timer5. This will generate timing data. The second command will generate an executable named yourprog.flop2. This will generate operation counts. You cannot generate instrumented codes on the scratch file system. -
Run both instrumented executables on a single processor.
prun -N 1 -n 1 ./yourprog.timer5 prun -N 1 -n 1 ./yourprog.flop2
The first executable will generate a file named tprof.out.XXX.YYY, where 'XXX' is derived from your job's compute node name and 'YYY' is derived from your job's process id. This file contains timing data for each of your routines. The second executable will generate a file named fprof.out.WWW.ZZZ, where 'WWW' is derived from your job's compute node and 'ZZZ' is derived from your job's process id. This file contains operation counts for each of your instrumented routines. You can use the information in these two files to determine an Mflops rate for your entire code and for each of your routines by dividing the operation counts by execution time.
An Mflops rate above 270 is acceptable on ben. If your code performs below that number you should invest some effort in improving its single-processor performance.
Cache performance
Poor cache performance is the most common cause of poor single-processor performance on ben.
To calculate the cache miss rate for your code follow the steps below.
-
Compile your code
f90 -fast -o yourprog yourprog.f90 -lmpi -lelan
-
Load the atom tools module.
module load atom
-
Use atom to instrument your code so that it will generate cache miss data.
atom -tool cache yourprog
This generates an executable named yourprog.cache. You cannot generate instrumented codes on the scratch file systems. -
Execute yourprog.cache on a single processor.
prun -N 1 -n 1 ./yourprog.cache
This will generate a file named cache.out, which contains the cache miss rate for your program.
A cache miss rate of greater than 10% is poor. If your code has a cache miss rate of greater than 10% you should use the standard methods for improving cache performance. They are described in http://oscinfo.osc.edu/training/perftunmic (see the link under "Handouts" at the bottom of the page) and http://www.psc.edu/training/WVU/Performance_Optimization.ppt.
Compilers and compiler options
The HP compilers, in general, generate more efficient code than the Gnu compilers. The HP compilers provide various compiler options to you to try to improve the single-processor performance of your code. The most likely candidates for both f90 and cc are
- -fast
- -arch host
- -tune host
- -unroll 8
- various levels of the -O option
The f90 options -transform_loops, -pipeline and -assume_buffered_io may also be useful. You should read the f90 and cc man pages to see what other options may improve the performance of your code.
We have found the -fast option to be most useful. The -fast option actually sets some of the other options. The f90 and cc man pages will tell you which ones.
However, be aware that some of the options, including the -fast option and the various levels of the -O option, may actually make the performance of your code worse. In any event, compiler options are not likely to have a significant impact on your code's performance.
Single statement performance measurement
When troubleshooting your code's single-processor performance it can occasionally be useful to determine how much time single statements are consuming. You can use the pixie tool to generate this information.
To generate single statement performance measurements follow the steps below.
-
Compile your program with the -g option.
f90 -fast -g -o yourprog yourprog.f90 -lmpi -lelan
-
Execute your program on a single processor using the pixie tool.
prun -N 1 -n 1 pixie -display ./yourprog > pixie.out
The file pixie.out will contain timing data for each statement in your code. The pixie line number is the same as your source code line number.
If you want to generate output at this level of granularity for only a section of your code you can use the -sigdump option to pixie to reset the collection of timing data. See the pixie man page for more details on using -sigdump.
Memory usage
If your code uses all of the memory available on a node to user codes and also has to use virtual memory, then, in this situation, the use of virtual memory can slow your program down by a factor of ten. Your code has about 3.5 Gbytes available to it per node. For the best performance we recommend that your code use no more than 3.0 Gbytes of main memory per node and that you never use virtual memory.
To measure how much memory your code is using per node you can use the system routine to execute the command "ps xl" on each of your nodes. For example, in Fortran the call would be
call system("ps xl")
Only
one processor per node needs to execute this command since it will report
on the memory used by all of the processors on a node.
This command will report on the virtual memory and resident memory used by each processor. The sum of the resident memory values is the number that you want to keep under 3.0 Gbytes. The OS is very aggressive about paging out memory pages that are not in use, so the virtual memory size will often be much larger than the resident memory size.
Three out of four processors
If the time between synchronizations in your code is small--250 milliseconds or less--then your code may perform better if you use three out of four processors per node rather than all four processors. This will free up the fourth processor for the system to handle system events. To use three processors per node set your value for -n in your prun to three times the value you specified for -N.
IO
We cannot yet make a general statement about whether your code should use $LOCAL or $SCRATCH for IO. We are continually working to improve both options. You should try both options, if possible, and see which option works better for you.
As much of your IO as possible should be unformatted IO. When using $SCRATCH you should use as large a buffersize as possible for your IO. Make your IO statements read and write as much data as possible and use the default values for blocksize in your open statements.
If you use $LOCAL and tcscp you should use the {rank} feature of tcscp.
MPI optimization
To make your code perform better you should use non-blocking MPI sends and receives. You should also post your receives prior to the corresponding sends.
Scalability
Another important factor in your code's performance is how well it scales. To determine how well it scales use the /bin/time command to time your pruns using increasing numbers of processors, while keeping the amount of work done per processor constant. As you increase the number of processors your execution time should decrease in a nearly linear fashion if your code scales well.
If your code does not scale well then an examination of the timings of your MPI calls can help you determine what the problem with your code is.
To collect timing data on your MPI calls follow the steps below.
-
Compile your program.
f90 -fast -o yourprog yourprog.f90 /usr/lib/libmpi.a -lelan -lelanctrl
-
Generate a list of the routines in your code.
nm -g yourprog | perl -e'while (<STDIN>) {@fields = split(" ", $_); \ if ($fields[4] eq "T") {print "$fields[0]\n";}}' > routines - Edit file routines to put the name of your main program first. With a C program or a Fortran program without a program statement the main program will be called 'main'. If a Fortran program has a program statement the main program name will be the name in the program statement with an '_' appended to it. You must insert the file /usr/local/packages/Atom/mpicalls.t into file routines in order to collect timing data on your MPI calls. You can also delete from file routines any routines for which you do not want to collect timing data.
-
Load the atom tools module.
module load atom
-
Instrument your code so that it will generate timings.
cat routines | atom -tool timer5 yourprog
This will generate an executable named yourprog.timer5. You cannot generate an instrumented code on the scratch file systems. - Run yourprog.timer5 on an increasing number for processors, while keeping the tprof.out.XXX.YYY files for each run.
You can evaluate the MPI timing data across your runs to help you determine what might be causing your code to scale poorly. If the MPI_Barrier time increases as the number of processors increases, you might have a load imbalance problem. You should redistribute the work across your nodes. If the MPI_Send, MPI_Receive or MPI_Reduce times increase as the number of processors increases, try to restructure your code to reduce the amount of communication between nodes.
Reporting a Problem
You have several options for reporting problems on ben.
- You can call the User Services Hotline at 1-800-221-1641 from 9:00 a.m. until 8:00 p.m., Eastern time, on weekdays, and 9:00 a.m. until 4:00 p.m., Eastern time, on Saturdays.
- You can send email to remarks@psc.edu.
Other Documentation
- There are man pages for most system commands, including qsub,
qstat, qdel and qalter. The pbs man page provides an overview of PBS. There
are also man pages for the MPI routines. You must use the standard
capitalization for the MPI routine when you use the man command.
- The home page for HP is http://www.hp.com/. Documentation specific to Tru64 can be found at http://h30097.www3.hp.com.
Acknowledgement in Publications
PSC requests that a copy of any publication (preprint or reprint) resulting from research done on PSC systems be sent to the PSC Allocation Coordinator
Publications resulting from work done on ben should include a credit similar to
"The computations were performed on the HPQ Alpha-SC system ben at the Pittsburgh Supercomputing Center."
NSF supported research:
"The above research was facilitated through an allocation of advanced computing resources by [name of Partnership], through the support of the National Science Foundation, the Commonwealth of Pennsylvania [if applicable], the University of XXX [if applicable], and other funding partners [if applicable]."
Commonwealth of Pennsylvania supported research:
"This research was supported in part by grant number _____________ from the Pittsburgh Supercomputing Center, supported by several federal agencies, the Commonwealth of Pennsylvania and private industry."