Rachel
rachel.psc.edu and jonas.psc.edu
- SYSTEM ARCHITECTURE AND CONFIGURATION
- STAY INFORMED
- ACCESS TO RACHEL
- Getting an allocation on rachel
- Changing your charging account
- Connecting to rachel
- Changing your rachel password
- Changing your login shell
- ACCOUNTING
- STORING FILES
- TRANSFERRING FILES
- COMPILERS
- COMMUNICATION LIBRARIES
- MODULE
- RUNNING A JOB
- MONITORING AND KILLING JOBS
- SOFTWARE PACKAGES
- RACHEL AND THE TERAGRID
- REPORTING A PROBLEM
- ACKNOWLEDGEMENT IN PUBLICATIONS
System Architecture and Configuration
Rachel is a loosely-coupled pair of SMP machines. Each system has 64 1.15 GHz EV7 processors with 256 Gbytes of shared memory. Logins are to a front end node with 2 EV67 processors. Both the front end node and the SMP machines run the Tru64 Unix operating system.
Rachel is primarily intended to run applications with low to moderate parallelism that require large memory bandwidth and/or large shared memory.
Stay Informed
As a user of rachel, it is imperative that you stay informed of changes to the machine's environment. Refer to this document frequently. In addition, important system status information is posted to the PSC's Web page of bboard posts.
You will also periodically receive email from PSC with information about rachel. In order to insure that you receive this email, you should make sure that your email forwarding is set properly by following the online instructions for setting your email forwarding.
Access to Rachel
Getting an allocation on rachel
There are two types of grants available on rachel: development grants and production grants. Development grants are appropriate as precursors to large requests or for work which will exploit the unique architectural capabilities of rachel, chiefly its large shared memory and memory bandwidth. Production grants are large awards for users with extensive computational requirements.
To apply for a production or a development grant on rachel you must fill out the online POPS proposal form. This form allows three types of request: Start-up, Medium and Large. The Medium and Large requests are for production grants that ask for 30,001-500,000 Service Units (SUs) and for 500,001 and above SUs respectively. The Start-up request should be used for development grants or for production grants that ask for 30,000 or fewer SUs.
Multiple rachel grants
If you have more than one rachel grant, be sure to charge each job to the appropriate account. See the section on command line options below.
Connecting to rachel
Use ssh to connect to rachel.
ssh rachel.psc.edu -l username
The first time you log in, you will receive a message similar to
Host key not found from list of known hosts. Are you sure you want to continue connecting?
Answer 'yes' to make the connection. You should not receive this message on subsequent connections.
You will next be prompted for your rachel password. Your rachel password is your PSC Kerberos password. If you enter your password successfully you will be connected to rachel.
Changing your rachel password
You must change your rachel password within 30 days of the date on the initial password sheet. If you don't, logins will be disabled on your account. Contact PSC User Services if this happens.
Use the kpasswd command to change your PSC Kerberos password. Do not use the passwd command to change your PSC password. You have the same password on all PSC production systems. If you change your password on one system using kpasswd it will change on all PSC production systems.
See the general PSC password policies.
Changing your login shell
You can change your login shell with the chsh command. When doing so you must specify a shell from the /usr/psc/shells directory. In your batch jobs, however, you can use a shell in /bin.
Accounting
Charging Algorithm
For the HP Marvels (rachel and jonas), the idea of a "virtual cpu"
is defined in terms of memory. Let M represent the total memory
available to users on the system (in Gbytes), and C represent the
total number of CPUs on the system. Then P, the per-processor share
of memory is defined as:
P = M/C
The charging algorithm for rachel and jonas is given as:
SU = max (ceil(m/P), N) * w
where:
- ceil = the ceiling function, which rounds any non-integer real number to the next largest integer.
- m = memory requested, in Gbytes
- P = per-processor share of memory, as defined above
- N = number of processors requested
- w = walltime used by job, in hours
Tracking charges
User acounting data is available on rachel with the xbanner command. Account information such as the total amount of the grant, the date of the last job and the remaining amount of the grant are displayed by xbanner.
Accounting information for grants is also available at the PSC Grant Management System on the Web at https://grants.psc.edu/arms.
You will need your PSC Kerberos password to access this system. This system can provide more detailed information than xbanner, although some of the information is only available to grant PIs. The system has extensive internal documentation.
Storing Files
File Systems
File systems are file storage spaces directly connected to a system. There are currently two such areas available to you on rachel.
- /usr/users/n/username
-
This is your home directory. The numeral 'n' will be replaced by an integer and 'username' will be replaced by your userid. You can also refer to this directory as $HOME. You have a 1 Gbyte quota for your home directory. Thus you will probably not be able to store your data files on $HOME. Your home directory is backed up. $HOME is visible to all of rachel's compute processors, but through a relatively slow connection.
- $LOCAL
-
Each of the SMP machines has 6 Tbytes of local disk space. The local space for a machine is not visible to the other SMP machines. However, the local spaces for all of the SMP machines are visible to the front end node, but through a very slow connection.
When you run a batch job you cannot determine on which SMP machine your job will run. Thus, you cannot ensure that it will run on the same SMP machine with the same local disk system as any of your prior runs. Therefore, you should consider this local space only as working disk space. In other words, you should copy your data files between golem and this local disk system at the beginning and end of your batch jobs with the far command. The file archiver golem is discussed below.
Within a job you can refer to the local space assigned to that job on its SMP machine as $LOCAL. You should refer to it with the variable name since we could change the implementation of $LOCAL for performance reasons. See the sample batch job below for an example of how to use $LOCAL and far in a batch job.
Files on a $LOCAL are, however, accessible to you on rachel's front end both while the job is running and after the job ends, either with the tcscp command or with standard Unix commands. To use either type of command you need to know your job's PBS jobid, which is given when you submit your job or in your job's .o output, and on which SMP machine your job ran. This can be found by using the qstat -f command's exec_host output field for a running job or in the .o output of a finished job. These two pieces of information are used to refer to your files on a $LOCAL. For example, if your job is running or ran on carson64a and the PBS jobid is or was 15786 then the local disk space for that job can be referred to as
/carson64a/local/15786in front end Unix commands and as
carson64a:/local/15786in tcscp commands.
Files on a local disk system left by a batch job will remain in place for a week after the job ends. However, they are not backed up and if disk space is low on a local disk system they can be deleted at any time to allow currently running jobs to continue. Thus, you should move any files that you want a permanent copy of from their local disk system to golem as soon as possible.
You must use tcscp to copy files from a local disk system to golem after a job ends. Once a job ends you cannot use far to perform this transfer. A tcscp copy will use a very fast path between the SMP machines and golem. See the discussion below of tcscp for an example of how to use tcscp.
You can also access your files on any of the local file systems using standard Unix commands if you use a filename similar to the one described above. For example, the command
tail /carson64b/local/21689/output.datcan be used to examine the end of file output.dat for job 21689 on SMP machine carson64b. This command can be issued while your job is running. Other Unix commands, such as ls or rm, will work similarly.
However, the connection between the rachel front end and the local file systems is very slow. Thus, you should not use the cp command to copy large files between $HOME and the local file systems. In general, you should limit your interactions with Unix commands between the front end and the local file systems on the SMP machines to essential operations.
File Repositories
File repositories are file storage spaces which are not directly connected to a front end or compute processor. You cannot, for example, in a program open a file that resides in a file repository. You must use explicit file copy commands to move files to and from the repository. You currently have one file repository available to you on rachel, golem, PSC's file archiver.
golem
Golem runs Cray's DMF file archival system. It is a combination tape-and-disk archival system.
The far program and the tcscp program can be used to transfer files between golem and rachel.
You can use kftp or scp to transfer files between golem and your remote machine. We strongly recommend kftp rather than scp for remote file transfer. We recommend against using sftp. See the golem Web page for more information.
You should store your data files on golem rather than in your home directory because your home directory space is limited. At the beginning of your batch jobs you should transfer your data files to $LOCAL and then at the end of your batch jobs copy files that you want a permanent copy of back to golem.
If you need to store a file to golem that is 2 Tbytes or larger please first contact User Services so that special arrangements can be made to store your file.
Transferring Files
Kftp
Rachel is running Kerberos 5 (K5) client/server software. Thus, if your local site has K5 client/server software installed, you can use kftp to transfer files to and from rachel whether you are logged into rachel or to your local machine. The examples below assume that you are logged into your local machine.
Before you can use kftp to transfer files, you must authenticate yourself to rachel. To do this use the kinit command.
kinit username@PSC.EDU
For 'username' substitute your PSC userid. PSC.EDU is PSC's Kerberos realm name.
After you enter this command you are prompted for your PSC Kerberos password, which is the password you use to login to rachel.
Once you are authenticated you can use the kftp command to actually perform your file transfers.
kftp rachel.psc.edu
The kftp command functions like the ftp command.
You should not use kftp to transfer files to $LOCAL.
You should verify that the Kerberos commands operate on your local system as described here. Some installations of Kerberized ftp differ in their implementation.
Man pages for kinit and kftp are available on rachel.
A Unix kftp client is available at http://www.pdc.kth.se/heimdal. A Windows kftp client is available at http://web.mit.edu/network/kerberos-form.html.
Scp, discussed below, and sftp can also be used for rachel file transfers, but kftp will be much faster.
Scp
The scp program can be used to transfer files between your remote machine and your rachel home directory.
The format for the scp command is
scp source-filename target-filenamewhere the filename on the remote system, whether it is the target or the source, must be specified as
username@system:filename
For example, to copy a file to your home directory on rachel when you are logged in to your home system use a command such as
scp filename username@rachel.psc.edu:/usr/users/n/username/filenameIf you are logged in to rachel and you want to copy over a file from your home system to rachel, use a command such as
scp username@remote-system:filename filename
The first time you use scp to or from rachel, you will receive a message similar to
Host key not found from list of known hosts. Are you sure you want to continue connecting?
Answer 'yes' to make the connection. You should not receive this message on subsequent connections.
You will be prompted next for your password on the remote system. For rachel you should use your PSC Kerberos password.
For more information on the scp command, see the scp man page.
You may be able to improve your scp transfer rate by using the blowfish encryption method rather than the default method, if your version of scp supports it. To use this method issue your scp command as
scp -c blowfish source-filename target-filename
Scp is part of the ssh distribution.
We strongly recommend using kftp rather than scp for remote file transfer if kftp is available.
Far
You can use the far program to move files between rachel and golem,.
Tcscp
The tcscp command, created by PSC, allows you to copy files from a local file system on an SMP machine to golem. Standard Unix file protections are used to determine which files you can copy with tcscp. Thus, other users will not be able to copy your files on a local file system unless you set the file permissions to allow this.
The format of the command is based on the cp command, with the addition of the ability to specify source and target machines as well as source and target filenames. For example, the command
tcscp carson64a:/local/15786/output.dat golem:output.dat
copies output.dat from your directory /local/15786 on SMP machine carson64a to your golem home directory. You can get the name of the machine your job ran on from your .o output or from qstat -f for a running job. The PBS jobid is available when you submit your job or from your .o output. You issue the tcscp command interactively while logged into rachel's front end node.
The wildcard characters '*' and '?' are permitted in source filename specifications and are treated as the shell treats them.
Just as with the cp command, you can specify multiple source filenames
tcscp carson64a:/local/15786/output1.dat carson64a:/local/15786/output2.dat golem:
This command will copy output1.dat and output2.dat from your directory /local/15786 on carson64a to your golem home directory. When you use this form of the command the last file specification is the target specification and must be a directory.
The tcscp command has several options. The -r option allows you to recursively copy directories and their contents, just like cp. The -v option runs the command in verbose mode. In verbose mode the fully expanded filenames used in the copy are shown as is timing data about the transfer. The -no option is used to specify that you do not want existing files to be overwritten if a target file has the same filename as an existing file. The default behavior of tcscp is to overwrite existing files. When you use the -no option tcscp skips over existing files. The -nk option causes tcscp to delete its source files after it successfully copies them. Finally, the -h option provdes help information for tcscp.
Tar
Whether you are transferring files between rachel and golem or between rachel and your remote system, if you have many files--1000 or more--it is much more efficient to tar them up into one file and then transfer this single tar file, especially if your files are small files, 64 Kbytes or smaller.
Tru64 tar--located at /bin/tar--can only create tar files up to 8 Gbytes. Gnu tar--located at /usr/psc/gnu/bin/tar--can create tar files that are larger than 8 Gbytes. However, a file created by Gnu tar that is larger than 8 Gbytes cannot be read by Tru64 tar.
You should first contact User Services if you are going to make a tar file that is 50 Gbytes or larger. You should move your tar file to golem or to your remote system as soon as you can after you create it.
Compilers
HP compilers for Fortran90, C, and C++ are available, as are Gnu compilers for C and C++.
Compile your program by executing one of the following sets of statements.
C programs: cc program-name.c for HP compiler and gcc program-name for Gnu compiler
C++ programs: cxx program-name.C for HP compiler or g++ program-name for Gnu compiler
Fortran90 programs: f90 program-name.f90
Please note that the HP C compiler expects a .c extension (lower case) while the HP C++ compiler expects a .C (upper case) extension.
The HP compilers will only run on rachel's front end node, not on its compute processors.
The make command executes /bin/make. To use Gnu make use the gmake command.
We recommend that you use the options
-O -fast -tune ev7 -arch ev7
when you compile using the HP compilers.
Communication libraries
MPI
The MPI message passing library is available on rachel. To enable your programs to use MPI, you must include the MPI header file in your source and link to the MPI libraries when you compile.
For Fortran, use this include directive in your source:
include 'mpif.h'
and compile with a command like:
f90 program.f90 -lmpi
For C, use this include directive:
#include <mpi.h>
and compile with a command similar to:
cc program.c -lmpi
OpenMP
The OpenMP library for shared memory communication is available on rachel. MPI will use shared memory for communication on rachel so you do not need to use OpenMP to do this.
If you want to use OpenMP you should use the -omp option when compiling. Also, before you execute your program set the environment variable OMP_NUM_THREADS to the number of virtual processors your job will use. The value of $PBS_VPPN is set to this value by the system. If you do this each thread will run on its own processor. See the discussion of running jobs on rachel below for more information on running OpenMP jobs on rachel, virtual processors, and $PBS_VPPN.
Module software
The Module package provides for the dynamic modification of a users's environment via module files. Module can be used:
- to manage multiple versions of applications, tools and libraries
- to manage software where complex changes to the environment are necessary
- to manage software where name conflicts with other software would cause problems
Module is available automatically for interactive use, although if you are going to use modules you should not switch your shell from your login shell during your interactive session.
To use module in a batch job, add one of these commands to your script:
- For csh type shells (csh, tcsh)
- source /usr/local/Modules/default/init/shell-name, where shell-name is either csh or tcsh.
- For sh type shells (sh, bash, ksh)
- .
/usr/local/Modules/default/init/shell-name, where shell-name is one of sh, bash, or ksh.
Some useful module commands are:
| module avail | lists all the available modules |
| module help foo | displays help on module foo |
| module whatis foo | displays a brief description of module foo |
| module display foo | indicates what changes would be made to the environment by loading module foo without actually loading it |
| module load foo | loads module foo |
| module unload foo | reverses all changes to the environment made by previously loading module foo |
Some modules are defined by the system (type module avail for a list), but you can create your own. For more information on module and how to create a modulefile, see the man pages for module and modulefile.
Running a Job
Scheduling policies
The Portable Batch Scheduler (PBS) controls all access to rachel's processors, for both batch and interactive jobs. PBS on rachel currently has only one queue. Interactive and batch jobs compete in this queue for scheduling.
The scheduler actually schedules virtual processors on the machine. A virtual processor is a physical processor associated with 3.7 Gbytes of memory. Memory is the most critical resource on rachel, especially given its potential impact on swapping performance. We tie memory to physical processors in this way to reduce job swapping.
Since memory is tied to processors in this way, specifying the amount of memory necessary for a job is enough for the system to determine how many virtual processors are needed.
The scheduling policies on rachel are designed to favor jobs requesting large numbers of virtual processors to make the best use of the rachel resource. Not allowing idle time is also important to the scheduler.
There are also policies in place to prevent a single user from dominating the machine.
Send email to remarks@psc.edu if you have computing needs that cannot be met by these scheduling policies.
Batch access
Use the qsub command to submit a job to PBS. A PBS job script consists of PBS directives, comments and executable commands. The last line of your job script must end with a newline character.
A sample job script for an MPI program is
#!/bin/csh
#PBS -l walltime=5:00:00
#PBS -l vmem=32gb
#PBS -j oe
set echo
cd $LOCAL
#copy executable and input files
cp $HOME/a.out .
far get input.dat .
#execute program
dmpirun -np ${PBS_VPPN} ./a.out
#cleanup in case dmpirun failed
mpiclean
#store output file
far store output.dat .
# clean up files if no longer needed
rm -f *
The first line in the script cannot be a PBS directive. Any PBS directive in the first line is ignored. Here, the first line identifies which shell should be used.
The next three lines are PBS directives.
- #PBS -l walltime=5:00:00
- The first directive requests a wall clock time limit of 5 hours. Specify the time in the format HH:MM:SS. Only two digits can be used for minutes and seconds. Do not use leading zeros. The default walltime is 30 minutes. The maximum time you can request is 168 hours (1 week).
- #PBS -l vmem=32gb
-
This directive specifies the maximum amount of memory your entire
job can use at one time. The largest value you can specify for vmem is
236 Gbytes. As was discussed above, from your
memory specification the system can determine that your job will run on
ceiling(32/3.7) = 9 virtual processors and thus on 9 physical processors.
Your job will reserve 9 physical processors while it runs even if it does
not use them all.
If it is more convenient you can specify the number of processors you want instead of the amount of memory. For example,
#PBS -l nodes=1:ppn=9requests 9 physical processors, which the system will translate into a request for 9 virtual processors with a total memory request of 9 * 3.7 = 33.3 Gbytes. The value for nodes must always be '1' in your processor specification.
If you specify both processors and memory the system will select as your number of virtual processors the larger of your physical processor specification and the virtual processor value computed from your memory request. You can request at most 64 virtual processors in a job.
We ask that your jobs not use more processors while it is running than your final virtual processor request. Doing so can degrade system performance.
- #PBS -j oe
- The final PBS directive combines your stdout and stderr output into one file, in this case stdout. This will make your program easier to debug.
The remaining lines in the script are comments or command lines.
- set echo
- This command causes your batch output to display each command next to its corresponding output. This will make your program easier to debug. If you are using the Bourne shell or one of its descendants use 'set -x' instead of 'set echo'.
- Comment lines
- The other lines in the sample script that begin with '#' are comment lines. The '#' for comments and PBS directives must begin in column one of your script file. The remaining lines in the sample script are executable commands.
- dmpirun
- The dmpirun command is used to launch your executable. The -np option to dmpirun indicates the number of processors your program will run on and should be set to $PBS_VPPN, which is set by the system to the number of virtual processors your job is requesting. Dmpirun cannot read directly from stdin but you can use input redirection. If your job is a serial job or an OpenMP job, you can just specify your executable name, although you must set the OMP_NUM_THREADS variable before running your OpenMP executable. You should set OMP_NUM_THREADS to $PBS_VPPN.
- mpiclean
- The mpiclean command must be executed to clean up system resources properly after a failed dmpirun execution.
A sample job script for an OpenMP program is
#!/bin/csh
#PBS -l walltime=5:00:00
#PBS -l nodes=1:ppn=8
#PBS -j oe
set echo
cd $LOCAL
# copy exexutable and input files
cp $HOME/a.out .
far get input.dat .
# specify number of OpenMP threads
setenv OMP_NUM_THREADS ${PBS_VPPN}
# execute program
./a.out
# store output file
far store output.dat .
# clean up files if no longer needed
rm -f *
There are several differences between the MPI and the OpenMP script. We recommend that you specify only the number of processors you want when using OpenMP and set OMP_NUM_THREADS to this value. This will insure that each of your threads runs on a separate processor and that each thread has enough memory in which to run.
Also, you do not run your executable with dmpirun. You just specify the executable name on its own command line. Nor do you need to run mpiclean after your executable is finished.
After you create your script you must make it executable with the chmod command.
chmod 755 yourscript.job
Then you can submit it to PBS with the qsub command.
qsub yourscript.job
Batch output, including your job's stdout and stderr output, is returned to the directory from which you issued the qsub comand after your job finishes.
Command line options
You can also specify the PBS directives as command-line options to qsub. Thus, you could omit the PBS directives in the sample script above and submit the script with
qsub -l walltime=5:00:00 -l vmem=4gb -j oe
Some command line options are not available as PBS directives. For example, to change the charging id for a job, use the -W group_list option:
qsub -W group_list=charge-id...
Command line options override PBS directives included in your script.
The -M and -m options can be used to have the system send you email when your job undergoes specified state transitions. See the man page for qsub for more information on these options and other options to the qsub command.
Interactive access via qsub -I
A form of interactive access is available by using the -I option to qsub. An example command using qsub -I is
qsub -I -l walltime=10:00 -l nodes=1:ppn=2
This command requests interactive access to 2 processors for 10 minutes.
The system will respond with a message similar to
qsub: waiting for job 54.rachel.psc.edu to start
Your qsub request will wait until it can be satisfied. If you want to cancel your request you should type ^C. You will have to try your interactive job at a later time. The qstat command can be used to show how many jobs are running on the system.
When your job starts you will receive the message
qsub: job 54.rachel.psc.edu ready
and then the OS command prompt. You can use the -M and -m options to qsub to have the system send you email when your job has started.
At this point any commands you enter will be run on your processors as if you had entered them in a batch script. For example, to run an MPI code you must enter a dmpirun command.
Stdin, stdout, and stderr are all connected to your terminal, although you will still need to use input direction for your MPI code to read stdin.
When you are finished with your interactive session type ^D. The system will respond
qsub: job 54.rachel.psc.edu completed
When you use qsub -I you are charged for the total time that you hold your processors and your memory whether you are computing or not. Thus, as soon as you are done running executables you should type ^D.
Monitoring and Killing Jobs
Qstat
The qstat command is used to display the status of the PBS queue. It includes running and queued jobs. The -f and -a options to qstat provide you with more extensive status listings. See the man page for qstat for more details.
Qdel
The qdel command is used to kill queued and running jobs.
Qalter
The qalter command is used to alter queue options for queued and running jobs.
Qstatform
The qstatform command shows how many virtual processors are in use on each of the two nodes. There are 64 virtual processors available on each node.
Software Packages
A list of software packages on rachel is available online. If you would like us to install a package that is not in this list send email to remarks@psc.edu.
Rachel and the TeraGrid
Rachel is on the TeraGrid. Thus, you have additional methods of connecting to rachel, of transferring files to and from rachel and of running jobs on rachel. For information on using the TeraGrid see the general online documentation for the TeraGrid and the PSC-specific online TeraGrid documentation.
Reporting a Problem
You have several options for reporting problems on rachel.
- You can call the User Services Hotline at 1-800-221-1641 from 9:00 a.m. until 8:00 p.m., Eastern time, on weekdays, and 9:00 a.m. until 4:00 p.m., Eastern time, on Saturdays.
- You can send email to remarks@psc.edu.
Acknowledgement in Publications
PSC requests that a copy of any publication (preprint or reprint) resulting from research done on PSC systems be sent to the PSC Allocation Coordinator. We also request that you include an acknowledgement of PSC in your publication.