Bridges User Guide: Running Jobs
Bridges User Guide
All production computing must be done on Bridges’ compute nodes, NOT on Bridges’ login nodes. The SLURM scheduler (Simple Linux Utility for Resource Management) manages and allocates all of Bridges’ compute nodes. Several partitions, or job queues, have been set up in SLURM to allocate resources efficiently.
To run a job on Bridges, you need to decide how you want to run: interactively, in batch, or through OnDemand; and where to run – that is, which partitions you are allowed to use.
What are the different ways to run a job?
You can run jobs in Bridges in several ways:
- interactive mode – where you type commands and receive output back to your screen as the commands complete
- batch mode – where you first create a batch (or job) script which contains the commands to be run, then submit the job to be run as soon as resources are available
- through OnDemand – a browser interface that allows you to run interactively, or create, edit and submit batch jobs and also provides a graphical interface to tools like RStudio, Jupyter notebooks, and IJulia, More information about OnDemand is in the OnDemand section of this User Guide
Regardless of which way you choose to run your jobs, you will always need to choose a partition to run them in.
Which partitions can I use?
Different partitions control different types of Bridges-2’s resources; they are configured by the type of node they control along with other job requirements like how many nodes or how much time or memory is needed. Your access to the partitions is based on the type of Bridges allocation that you have (“Bridges Regular Memory”, “Bridges Large Memory”, ‘Bridges GPU”, or “Bridges AI”). You may have more than one type of allocation; in that case, you will have access to more than one set of partitions
You can see which Bridges’ resources that you have been allocated with the projects command. See section the projects command in the account Administration section of the this user guide for more information.
In this document:
Ways to run a job
- Interactive session
- Batch jobs
- OnDemand: OnDemand is not discussed in this document. See the OnDemand section of the Bridges User Guid
Managing multiple grants
Partition
- Bridges partitions
- Partitions for “Bridges regular memory” allocations
- Partitions for “Bridges large memory” allocations
- Partitions for “Bridges GPU” allocations
- Partitions for “Bridges-AI” allocations
Node, partition, and job status information
- sinfo: display information about Bridges’ nodes
- squeue: display information about the SLURM partitions
- scancel: kill a job
- sacct: display detailed information about a job. This information can help to determine why a job failed.
- srun, sstat, sacct and jobinfo: monitor the memory usage of a job
Interactive sessions
You can do your production work interactively on Bridges, typing commands on the command line, and getting responses back in real time. But you must be allocated the use of one or more Bridges’ compute nodes by SLURM to work interactively on Bridges. You cannot use the Bridges login nodes for your work.
You can run an interactive session in any of the SLURM partitions. You will need to specify which partition you want, so that the proper resources are allocated for your use
If all of the resources set aside for interactive use are in use, your request will wait until the resources you need are available. Using a shared partition (RM-shared, GPU-shared) will probably allow your job to start sooner.
The interact command
To start an interactive session, use the command interact
. The format i
interact -options
The simplest interact command is
interact
This command will start an interactive job using the defaults for interact
, which are
Partition: RM-small
Cores: 1
Time limit: 60 minutes
Once the interact
command returns with a command prompt you can enter your commands. The shell will be your default shell. When you are finished with your job, type CTRL-D.
[bridgesuser@br006 ~]$ interact A command prompt will appear when your session begins "Ctrl+d" or "exit" will end your session [bridgesuser@r004 ~]
Notes:
- Be sure to use the correct account id for your job if you have more than one grant. See “Managing multiple grants“.
- Service Units (SU) accrue for your resource usage from the time the prompt appears until you type CTRL-D, so be sure to type CTRL-D as soon as you are done.
- The maximum time you can request is 8 hours. Inactive interact jobs are logged out after 30 minutes of idle time.
- By default,
interact
uses the RM-small partition. Use the-p
option for interact to use a different partition.
Options for interact
If you want to run in a different partition, use more than one core or set a different time limit, you will need to use options to the interact
command. Available options are given below.
Options to the interact command | ||
---|---|---|
Option | Description | Default value |
-p partition | Partition requested | RM-small |
-t HH:MM:SS |
Walltime requested The maximum time you can request is 8 hours. |
60:00 (1 hour) |
-N n | Number of nodes requested | 1 |
–ntasks-per-node=n Note the “–” for this option |
Number of cores to allocate per node | 1 |
-n NTasks | Number of tasks spread over all nodes | N/A |
–egress |
Allows your compute nodes to communicate with sites external to Bridges. | N/A |
–gres=gpu:type:n Note the “–” for this option |
Specifies the type and number of GPUs requested. ‘type’ is one of: volta32, volta16, p100 or k80. For the GPU, GPU-shared and GPU-small partitions, type is either k80 or p100. The default is k80. For the GPU-AI partition, type is either volta16 or volta32. ‘n’ is the number of GPUs. Valid choices are
|
No default |
–gpu Note the “–” for this option |
Runs your job on 1 P100 GPU in the GPU-small partition | N/A |
-A account id |
SLURM account id for the job Note: Files created during a job will be owned by the Unix group in effect when the job is submitted. This may be different than the account id for the job. See the discussion of the newgrp command in the Account Administration section of this User Guide to see how to change the Unix group currently in effect. |
Your default account id |
–mem=nGB Note the “–” for this option |
Amount of memory requested in GB. This option should only be used for the LM partition. |
No default |
-R reservation-name | Reservation name, if you have one Use of -R does not automatically set any other interact options. You still need to specify the other options (partition, walltime, number of nodes) to override the defaults for the interact command. If your reservation is not assigned to your default account, then you will need to use the -A option when you issue your interact command. |
No default |
-h | Help, lists all the available command options | N/A |
See also
- Bridges partitions
- How to determine your valid SLURM account ids and Unix groups and change your default, in the Account adminstration section of the Bridges User Guide
- Managing multiple grants
- The srun command, for more complex control over your interactive job
Batch jobs
Instead of working interactively on Bridges, you can instead run in batch. This means you will
- create a file called a batch or job script
- submit that script to a partition (queue) using the
sbatch
command - wait for the job to come to the top of the queue and run
- if you like, check on the job’s progress as it waits in the partition and as it is running
- check the output file for results or any errors when it finishes
A simple example
This section outlines an example which submits a simple batch job. More detail on batch scripts, the sbatch
command and its options follow.
Create a batch script
Use any editor you like to create your batch scripts. A simple batch script named hello.job which runs a “hello world” command is given here. Comments, which begin with ‘#’, explain what each line does.
The first line of any batch script must indicate the shell to use for your batch job.
#!/bin/bash
# use the bash shell
set -x
# echo each command to standard out before running it
date
# run the Unix 'date' command
echo "Hello world, from Bridges!"
# run the Unix 'echo' command
Submit the batch script to a partition
Use the sbatch
command to submit the hello.job script.
[joeuser@login005 ~]$ sbatch hello.job
Submitted batch job 7408623
Note the jobid that is echoed back to you when the job is submitted. Here it is 7408623.
Check on the job progress
You can check on the job’s progress in the partition by using the squeue
command. By default you will get a list of all running and queued jobs. Use the -u option with your username to see only your jobs. See the squeue command for details.
[joeuser@login005 ~]$ squeue -u joeuser
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7408623 RM hello.jo joeuser PD 0:08 1 r7320:00
The status “PD” (pending) in the output here shows that job 7408623 is waiting in the queue. See more about the squeue command below.
When the job is done, squeue will no longer show it:
[joeuser@login005 ~]$ squeue -u joeuser
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
Check the output file when the job is done
By default, the standard output and error from a job are saved in a file with the name slurm-jobid.out, in the directory that the job was submitted from.
[joeuser@login005 ~]$ more slurm-7408623.out
+ date
Sun Jan 19 10:27:06 EST 2020
+ echo 'Hello world, from Bridges!'
Hello world, from Bridges!
[joeuser@login005 ~]$
The sbatch command
To submit a batch job, use the sbatch
command. The format is
sbatch -options batch-script
The options to sbatch
can either be in your batch script or on the sbatch command line. Options in the command line override those in the batch script.
Note:
- Be sure to use the correct account id if you have more than one grant. Please see the
-A
option for sbatch to change the SLURM account id for a job. Information on how to determine your valid account ids and change your default account id is in the Account adminstration section of the Bridges User Guide. - In some cases, the options for
sbatch
differ from the options forinteract
orsrun
. - By default,
sbatch
submits jobs to the RM partition. Use the-p
option for sbatch to direct your job to a different partition
Options to the sbatch command
For more information about these options and other useful sbatch options see the sbatch man page.
Options to the sbatch command | ||
---|---|---|
Option | Description | Default |
-p partition | Partition requested | RM |
-t HH:MM:SS | Walltime requested in HH:MM:SS | 30 minutes |
-N n | Number of nodes requested. | 1 |
-o filename | Save standard out and error in filename. This file will be written to the directory that the job was submitted from. | slurm-jobid.out |
-A account id |
SLURM account id for the job. If not specified, your default account id is used. Find your default SLURM account id. Note: Files created during a job will be owned by the Unix group in effect when the job is submitted. This may be different than the account id used by the job. See the discussion of the newgrp command in the Account Administration section of this User Guide to see how to change the Unix group currently in effect. |
Your default account id |
–res reservation-name |
Use the reservation that has been set up for you. Use of –res does not automatically set any other options. You still need to specify the other options (partition, walltime, number of nodes) that you would in any sbatch command. If your reservation is not assigned to your default account then you will need to use the -A option to sbatch to specify the account. | NA |
–mem=nGB |
Memory in GB. This option is only valid for the LM partition. | None |
-C constraints |
Specifies constraints which the nodes allocated to this job must satisfy. An sbatch command can have only one -C option. Multiple constraints can be specified with “&”. For example, Some valid constraints are:
See the discussion of the -C option in the sbatch man page for more information. |
None |
–gres=gpu:type:n |
Specifies the type and number of GPUs requested. ‘type’ is one of: volta32, volta16, p100 or k80. For the GPU, GPU-shared and GPU-small partitions, type is either k80 or p100. The default is k80. For the GPU-AI partition, type is either volta16 or volta32 ‘n’ is the number of GPUs. Valid choices are
|
None |
–ntasks-per-node=n |
Request n cores be allocated per node. | 1 |
–mail-type=type |
Send email when job events occur, where type can be BEGIN, END, FAIL or ALL. | None |
–mail-user=user |
User to send email to as specified by -mail-type. Default is the user who submits the job. | None |
-d=dependency-list |
Set up dependencies between jobs, where dependency-list can be:
|
None |
–no-requeue Note the “–” for this option |
Specifies that your job will be not be requeued under any circumstances. If your job is running on a node that fails it will not be restarted. Note the “–” for this option. | NA |
–time-min=HH:MM:SS |
Specifies a minimum walltime for your job in HH:MM:SS format. SLURM considers the walltime requested when deciding which job to start next. Free slots on the machine are defined by the number of nodes and how long those nodes are free until they will be needed by another job. By specifying a minimum walltime you allow the scheduler to reduce your walltime request to your specified minimum time when deciding whether to schedule your job. This could allow your job to start sooner. If you use this option your actual walltime assignment can vary between your minimum time and the time you specified with the -t option. If your job hits its actual walltime limit, it will be killed. When you use this option you should checkpoint your job frequently to save the results obtained to that point. |
None |
–switches=1 |
Requests that the nodes your job runs on all be on one switch, which is a hardware grouping of 42 nodes. If you are asking for more than 1 and fewer than 42 nodes, your job will run more efficiently if it runs on one switch. Normally switches are shared across jobs, so using the switches option means your job may wait longer in the queue before it starts. The optional time parameter gives a maximum time that your job will wait for a switch to be available. If it has waited this maximum time, the request for your job to be run on a switch will be cancelled. |
NA |
-h | Help, lists all the available command options |
See also
- Bridges partitions
- How to determine your valid account ids and change your defaults, in the Account administration section of the Bridges User Guide
- Managing multiple grants
Managing multiple grants
If you have more than one grant, be sure to use the correct SLURM account id and Unix group when running jobs.
See “Managing multiple grants” in the Account Administration section of the Bridges User Guide to see how to find your account ids and Unix groups and determine or change your defaults.
Permanently change your default SLURM account id and Unix group
See the change_primary_group
command in the “Managing multiple grants” in the Account Administration section of the Bridges User Guide to permanently change your default SLURM account id and Unix group.
Temporarily change your SLURM account id or Unix group
See the -A
option to the sbatch
or interact commands to set the SLURM account id for a specific job.
The newgrp
command will change your Unix group for that login session only. Note that any files created by a job are owned by the Unix group in effect when the job is submitted, which is not necessarily the same as the account id used for the job. See the newgrp command in the Account Administration section of the Bridges User Guide to see how to change the Unix group currently in effect.
Bridges partitions
Each SLURM partition manages a subset of Bridges’ resources. Each partition allocates resources to interactive sessions, batch jobs, and OnDemand sessions that request resources from it.
Know which partitions are open to you: Your Bridges allocations determine which partitions you can submit jobs to.
- A “Bridges regular memory” allocation allows you to use Bridges’ RSM (128GB) nodes. Partitions available to “Bridges regular memory” allocations are
- RM, for jobs that will run on Bridges’ RSM (128GB) nodes, and use one or more full nodes
- RM-shared, for jobs that will run on Bridges’ RSM (128GB) nodes, but share a node with other jobs
- RM-small, for short jobs needing 2 full nodes or less, that will run on Bridges RSM (128GB) nodes
- A “Bridges large memory” allocation allows you to use Bridges LSM and ESM (3TB and 12TB) nodes. There is one partition available to “Bridges large memory” allocations:
- LM, for jobs that will run on Bridges’ LSM and ESM (3TB and 12TB) nodes
- A “Bridges GPU” allocation allows you to use Bridges’ GPU nodes. Partitions available to “Bridges GPU” allocations are:
- GPU, for jobs that will run on Bridges’ GPU nodes, and use one or more full nodes
- GPU-shared, for jobs that will run on Bridges’ GPU nodes, but share a node with other jobs
- GPU-small, for jobs that will use only one of Bridges’ GPU nodes and 8 hours or less of wall time.
- A “Bridges-AI” allocation allows you to you Bridges’ Volta GPU nodes. There is one partition available to “Bridges-AI” allocations:
- GPU-AI, for jobs that will run on Bridges’ Volta 16 nodes or the DGX-2.
All the partitions use FIFO scheduling. If the top job in the partition will not fit, SLURM will try to schedule the next job in the partition. The scheduler follows policies to ensure that one user does not dominate the machine. There are also limits to the number of nodes and cores a user can simultaneously use. Scheduling policies are always under review to ensure best turnaround for users.
Partitions for “Bridges regular memory” allocations
There a three partitions available for “Bridges regular memory allocations”: RM, RM-shared and RM-small.
Use your allocation wisely: To make the most of your allocation, use the shared partitions whenever possible. Jobs in the RM partition use of all the cores on a node, and incur Service Units (SU) for all 28 cores. Jobs in the RM-shared partition share nodes, and SUs accrue only for the number of cores they are allocated. The RM partition is the default for the sbatch
command, while RM-small is the default for the interact
command. See the discussion of the interact
and sbatch
commands in this document for more information.
Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job. See “Managing multiple grants”.
For information on requesting resources and submitting jobs see the discussion of the interact or sbatch commands.
RM partition
Jobs in the RM partition run on Bridges’ RSM (128GB) nodes. Jobs do not share nodes, and are allocated all 28 of the cores on each of the nodes assigned to them. A job in the RM partition incurs SUs for all 28 cores per node on its assigned nodes.
RM jobs can use more than one node. However, the memory space of all the nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.
The internode communication performance for jobs in the RM partition is best when using 42 or fewer nodes.
When submitting a job to the RM partition, you can request:
- the number of nodes
- the walltime limit
If you do not specify the number of nodes or time limit, you will get the defaults. See the summary table for Bridges’ regular memory nodes below for the defaults.
You cannot specify:
- a specific memory allocation
Asking explicitly for memory for a job in the RM partition will cause the job to fail.
Sample interact command for the RM partition
An example of an interact
command for the RM partition, requesting the use of 2 nodes for 30 minutes is
interact -p RM -N 2 -t 30:00
where:
-p indicates the intended partition
-N is the number of nodes requested
-t is the walltime requested in the format HH:MM:SS
Sample sbatch command for RM partition
An example of a sbatch
command to submit a job to the RM partition, requesting one node for 5 hours is
sbatch -p RM -t 5:00:00 -N 1 myscript.job
where:
-p indicates the intended partition
-t is the walltime requested in the format HH:MM:SS
-N is the number of nodes requested
myscript.job is the name of your batch script
RM-shared partition
Jobs in the RM-shared partition run on (part of) a Bridges’ RSM (128GB) node. Jobs will share a node with other jobs, but will not share cores. A job in the RM-shared partition will accrue SUs only for the cores allocated to it, so it will use fewer SUs than a RM job. It could also start running sooner.
RM-shared jobs are assigned memory in proportion to the number of requested cores. They get the fraction of the node’s total memory in proportion to the number of cores requested. If the job exceeds this amount of memory it will be killed.
When submitting a job to the RM-shared partition, you should specif
- the number of cores
- the walltime limit
If you do not specify the number of nodes or time limit, you will get the defaults. See the summary table for Bridges’ regular memory nodes below for the defaults.
You cannot specify:
- a specific memory allocatio
Asking explicitly for memory for a job in the RM-shared partition will cause the job to fail.
Sample interact command for the RM-shared partition
Run in the RM-shared partition using 4 cores and 1 hour of walltime.
interact -p RM-shared --ntasks-per-node=4 -t 1:00:00
where:
-p indicates the intended partition
–ntasks-per-node requests the use of 4 cores
-t is the walltime requested in the format HH:MM:SS
Sample sbatch command for the RM-shared partition
Submit a job to RM-shared asking for 2 cores and 5 hours of walltime.
sbatch -p RM-shared --ntasks-per-node=2 -t 5:00:00 myscript.job
where:
-p indicates the intended partition
–ntasks-per-node requests the use of 2 cores
-t is the walltime requested in the format HH:MM:S
myscript.job is the name of your batch script
Sample batch script for RM-shared partition
#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH -t 5:00:00
#SBATCH --ntasks-per-node=2
#echo commands to stdout
set -x
# move to working directory
# this job assumes:
# - all input data is stored in this directory
# - all output should be stored in this directory
cd /pylon5/groupname/username/path-to-directory
#run OpenMP program
export OMP_NUM_THREADS 2
./myopenmp
Notes: For groupname, username, and path-to-directory substitute your Unix group, username, and directory path.
RM-small partition
Jobs in the RM-small partition run on Bridges’ RSM (128GB) nodes, but are limited to at most 2 full nodes and 8 hours. Jobs can share nodes. Note that the memory space of all the nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not. When submitting a job to the RM-small partition, you should specify:
- the number of nodes
- the number of cores
- the walltime limit
If you do not specify the number of nodes or time limit, you will get the defaults. See the summary table for Bridges’ regular memory nodes below for the defaults.
You cannot::
- ask for a specific memory allocatio
Asking explicitly for memory for a job in the RM-small partition will cause the job to fail.
Sample interact command for the RM-small partition
Run in the RM-small partition using one node, 8 cores and 45 minutes of walltime.
interact -p RM-small -N 1 --ntasks-per-node=8 -t 45:00
where:
-p indicates the intended partition
-N requests one node
–ntasks-per-node requests the use of 8 cores
-t is the walltime requested in the format HH:MM:SS
Sample sbatch command for the RM-small partition
Submit a job to RM-small asking for 2 nodes and 6 hours of walltime.
sbatch -p RM-small -N 2 -t 6:00:00 myscript.job
where:
-p indicates the intended partition
-N requests the use of 2 nodes
-t is the walltime requested in the format HH:MM:SS
myscript.job is the name of your batch script
Summary of partitions for Bridges regular memory nodes
Partition name | RM | RM-shared | RM-small |
---|---|---|---|
Node type | 128GB 28 cores 8TB on-node storage |
128GB 28 cores 8TB on-node storage |
128GB 28 cores 8TB on-node storage |
Nodes shared? | No | Yes | Yes |
Node default | 1 | 1 | 1 |
Node max | 168 If you need more than 168, contact bridges@psc.edu to make special arrangements. |
1 | 2 |
Core default | 28/node | 1 | 1 |
Core max | 28/node | 28 | 28/node |
Walltime default | 30 mins | 30 mins | 30 mins |
Walltime max | 72 hrs | 72 hrs | 8 hrs |
Memory | 128GB/node | 4.5GB/core | 4.5GB/core |
See also
Partitions for “Bridges large memory” allocations
There is one partition available for “Bridges large memory” allocations: LM
Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job. See “Managing multiple grants”.
For information on requesting resources and submitting jobs see the interact or sbatch commands.
LM partition
Jobs in the LM partition share nodes. Your memory space for an LM job is an integrated, shared memory space.
When submitting a job to the LM partition, you must
- use the –mem option to request the amount of memory you need, in GB. Any value up to 12000GB can be requested There is no default memory value. Each core on the 3TB and 12TB nodes is associated with a fixed amount of memory, so the amount of memory you request determines the number of cores assigned to your job.
- specify the walltime limit
You cannot:
- specifically request a number of cores
SLURM will place jobs on either a 3TB or a 12TB node based on the memory request. Jobs asking for 3000GB or less will run on a 3TB node. If no 3TB nodes are available but a 12TB node is available, those jobs will run on a 12TB node.
Once your job is running, the environment variable SLURM_TASKS_PER_NODE tells you the number of cores assigned to your job.
Sample interact command for LM
Run in the LM partition and request 2TB of memory. Use the wall time default of 30 minutes.
interact -p LM --mem=2000GB
where:
-p indicates the intended partition (LM)
–mem is the amount of memory requested
Sample sbatch
command for the LM partition
A sample sbatch command for the LM partition requesting 10 hours of wall time and 6TB of memory is:
sbatch -p LM - t 10:00:00 --mem 6000GB myscript.job
where:
-p indicates the intended partition (LM)
-t is the walltime requested in the format HH:MM:SS
–mem is the amount of memory requeste
myscript.job is the name of your batch script
Summary of partition for Bridges large memory nodes
Partition name | LM | |
---|---|---|
LSM nodes | ESM nodes | |
Node type | 3TB RAM 16TB on-node storage |
12TB RAM 64TB on-node storage |
Nodes shared? | Yes | Yes |
Node default | 1 | 1 |
Node max | 8 | 4 |
Cores | Jobs are allocated 1 core/48GB of memory requested. | Jobs are allocated 1 core/48GB of memory requested. |
Walltime default | 30 mins | 30 mins |
Walltime max | 14 days | 14 days |
Memory | Up to 3000GB | Up to 12,000GB |
See also
Partitions for “Bridges GPU” allocations
There are three partitions available for “Bridges GPU” allocations: GPU, GPU-shared and GPU-small.
Use your allocation wisely: To make the most of your allocation, use the shared partitions whenever possible. Jobs in the GPU partition use all of the cores on a node, and accrue SU costs for every core. Jobs in the GPU-shared partition share nodes, and only incur SU cost for the number of cores they are allocated.
Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job. See “Managing multiple grants”.
For information on requesting resources and submitting jobs see the interact or sbatch commands.
GPU partition
Jobs in the GPU partition use Bridges’ GPU nodes. Note that Bridges has 2 types of GPU nodes: K80s and P100s. See the System Configuration section of this User Guide for the details of each type.
Jobs in the GPU partition do not share nodes, so jobs are allocated all the cores and all of the GPUs associated with the nodes assigned to them . Your job will incur SU costs for all of the cores on your assigned nodes.
The memory space across nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.
When submitting a job to the GPU partition, you must use the --gres=type
option to specify
- the type of node you want, K80 or P100. K80 is the default if no type is specified.
- the number of GPUs you want
See the sbatch command options for more details on the --gres=type
option.
You should also specify:
- the number of nodes
- the walltime limit
Sample interact command for GPU
An interact command to start a GPU job on 4 P100 nodes for 30 minutes is
interact -p GPU --gres=gpu:p100:2 -N 4 -t 30:00
where:
-p indicates the intended partition
–gres=gpu:p100:2 requests the use of 2 P100 GPUs
-N requests 4 nodes
-t is the walltime requested in the format HH:MM:SS
Sample sbatch command for GPU
This command requests the use of one K80 GPU node for 45 minutes;
sbatch -p GPU --gres=gpu:k80:4 -N 1 -t 45:00 myscript.job
where:
-p indicates the intended partition
–gres=gpu:k80:4 requests the use of 4 K80 GPUs
-N requests one node
-t is the walltime requested in the format HH:MM:SS
myscript.job is the name of your batch script
Sample batch script for GPU partition
#!/bin/bash
#SBATCH -N 2
#SBATCH -p GPU
#SBATCH --ntasks-per-node 28
#SBATCH -t 5:00:00
#SBATCH --gres=gpu:p100:2
#echo commands to stdout
set -x
#move to working directory
# this job assumes:
# - all input data is stored in this directory
# - all output should be stored in this directory
cd /pylon5/groupname/username/path-to-directory
#run GPU program
./mygpu
Notes:The value of the –gres-gpu option indicates the type and number of GPUs you want.For groupname, username and path-to-directory you must substitute your Unix group, username and appropriate directory path.
GPU-shared partition
Jobs in the GPU-shared partition run on Bridges’s GPU nodes. Note that Bridges has 2 types of GPU nodes: K80s and P100s. See the System Configuration section of this User Guide for the details of each type.
Jobs in the GPU-shared partition share nodes, but not cores. By sharing nodes your job will use fewer Service Units. It could also start running sooner.
You will always run on (part of) one node in the GPU-shared partition.Your jobs will be allocated memory in proportion to the number of requested GPUs. You get the fraction of the node’s total memory in proportion to the fraction of GPUs you requested. If your job exceeds this amount of memory it will be killed.
When submitting a job to the GPU-shared partition, you must specify the number of GPUs. You should also specify:
- the type of GPU node you want, K80or P100, with the
--gres=type
option to theinteract
orsbatch
commands. K80 is the default if no type is specified. See the sbatch command options for more details. - the walltime limit
Sample interact command for GPU-shared
Run in the GPU-shared partition and ask for 4 K80 GPUs and 8 hours of wall time.
interact -p GPU-shared --gres=gpu:k80:4 -t 8:00:00
where:
-p indicates the intended partition
–gres=gpu:k80:4 requests the use of 4 K80 GPUs
-t is the walltime requested in the format HH:MM:SS
Sample sbatch command for GPU-shared
Submit a job to the GPU-shared partition requesting 2 P100 GPUs and 1 hour of wall time.
sbatch -p GPU-shared --gres=gpu:P100:2 -t 1:00:00 myscript.job
where:
-p indicates the intended partition
–gres=gpu:P100:2 requests the use of 2 P100 GPUs
-t is the walltime requested in the format HH:MM:SS
myscript.job is the name of your batch script
Sample batch script for GPU-shared partition
#!/bin/bash
#SBATCH -N 1
#SBATCH -p GPU-shared
#SBATCH --ntasks-per-node 7
#SBATCH --gres=gpu:p100:1
#SBATCH -t 5:00:00
#echo commands to stdout
set -x
#move to working directory # this job assumes:
# - all input data is stored in this directory
# - all output should be stored in this directory
cd /pylon5/groupname/username/path-to-directory
#run GPU program ./mygpu
Notes:The option –gres-gpu indicates the number and type of GPUs you want.For groupname, username and path-to-directory you must substitute your Unix group, username, and appropriate directory path.
GPU-small
Jobs in the GPU-small partition run on one of Bridges’ P100 GPU nodes. Your jobs will be allocated memory in proportion to the number of requested GPUs. You get the fraction of the node’s total memory in proportion to the fraction of GPUs you requested. If your job exceeds this amount of memory it will be killed.
When submitting a job to the GPU-small partition, you must specify the number of GPUs with the --gres=gpu:p100:n
option to the interact
or sbatch
command. In this partition, n can be 1 or 2. You should also specify the walltime limit.
Sample interact command for GPU-small
Run in the GPU-small partition and ask for 2 P100 GPUs and 2 hours of wall time.
interact -p GPU-small --gres=gpu:p100:2 -t 2:00:00
where:
-p indicates the intended partition
–gres=gpu:P100:2 requests the use of 2 P100 GPUs
-t is the walltime requested in the format HH:MM:SS
Sample sbatch command for GPU-small
Submit a job to the GPU-small partition using 2 P100 GPUs and 1 hour of wall time.
sbatch -p GPU-small --gres=gpu:p100:2 -t 1:00:00 myscript.job
where:
-p indicates the intended partition
–gres=gpu:P100:2 requests the use of 2 P100 GPUs
-t is the walltime requested in the format HH:MM:SS
myscript.job is the name of your batch script
Summary of partitions for Bridges’ GPU nodes
Partition name | GPU | GPU-shared | GPU-small | ||
---|---|---|---|---|---|
P100 nodes | K80 nodes | P100 nodes | K80 nodes | P100 nodes | |
Node type | 2 GPUs 2 16-core CPUs 8TB on-node storage |
4 GPUs 2 14-core CPUS 8TB on-node storage |
2 GPUs 2 16-core CPUs 8TB on-node storage |
4 GPUs 2 14-core CPUs 8TB on-node storage |
2 GPUs 2 16-core CPUs 8TB on-node storage |
Nodes shared? | No | No | Yes | Yes | No |
Node default | 1 | 1 | 1 | 1 | 1 |
Node max | 4 Limit of 8 GPUs/job. Because there are 2 GPUs on each P100 node, you can request at most 4 nodes. |
2 Limit of 8 GPUs/job. Because there are 4 GPUs on each K80 node, you can request at most 2 nodes. |
1 | 1 | 1 |
Core default | 32/node | 28/node | 16/GPU | 7/GPU | 32/node |
Core max | 32/node | 28/node | 16/GPU | 7/GPU | 32/node |
GPU default | 2/node | 4/node | No default | No default | No default |
GPU max | 2/node | 4/node | 2 | 4 | 2 |
Walltime default | 30 mins | 30 mins | 30 mins | 30 mins | 30 mins |
Walltime max | 48 hrs | 48 hrs | 48 hrs | 48 hrs | 8 hrs |
Memory | 128GB/node | 128GB/node | 7GB/GPU | 7GB/GPU | 128GB/node |
See also
Partition for “Bridges-AI” allocations
There is one partition available for “Bridges-AI” allocations: GPU-AI. There are two node types available:
- “Volta 16” – nine HPE Apollo 6500 servers, each with 8 NVIDIA Tesla V100 GPUs with 16 GB of GPU memory each, connected by NVLink 2.0
- “Volta 32” – NVIDIA DGX-2 enterprise research AI system tightly coupling 16 NVIDIA Tesla V100 (Volta) GPUs with 32 GB of GPU memory each, connected by NVLink and NVSwitch
We strongly recommend the use of Singularity containers on the AI nodes, especially on the DGX-2. We have installed containers for many popular AI packages on Bridges for you to use, but you can create your own if you like.
For more information on Singularity and the containers available on Bridges:
Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job. See “Managing multiple grants”.
For information on requesting resources and submitting jobs see the interact or sbatch commands.
Using module files on Bridges-AI
The Module package provides for the dynamic modification of a users’s environment via module files. Module files manage necessary changes to the environment, such as adding to the default path or defining environment variables, so that you do not have to manage those definitions and paths manually. Before you can use module files in a batch job on Bridges-AI, you must issue the following command:
If you are using the bash or ksh:
source /etc/profile.d/modules.sh
If you are using csh or tcsh:
source /etc/profile.d/modules.csh
See the Module documentation for information on the module command.
GPU-AI partition
When submitting a job to the GPU-AI partition, you must use the –gres=gpu:type:n parameter to specify the type and number of Volta GPUs you will use. Valid options are
- For the Volta 16 nodes, with 16GB of GPU memory, type is volta16; n can be 1-8.
- For the DGX-2, with 32GB of GPU memory, type is volta32; n can be 1-16.
See the sbatch command for an explanation of the --gres
option.
Sample interact command for GPU-AI
To run in an interactive session on Bridges-AI, use the interact
command and specify the GPU-AI partition. An example interact
command to request 1 GPU on a Volta 16 node is:
interact -p GPU-AI --gres=gpu:volta16:1
Where:
-p indicates the intended partition
–gres=gpu:volta16:1 requests the use of 1 V100 GPU on an Apollo node
Sample sbatch command for the GPU-AI partition
A sample sbatch
command to submit a job to run on one of the Volta 16 nodes and use all eight GPUs would be
sbatch -p GPU-AI -N 1 --gres=gpu:volta16:8 -t 1:00:00 myscript.job
where
-p GPU-AI requests the GPU-AI partition
-N 1 requests one node
–gres=gpu:volta16:8 requests an Apollo server with Volta 100 GPUs , and specifies you will use all 8 GPUs on that node
-t 1:00:00 requests one hour of running time
myscript.job is your batch script.
Summary of the partition for Bridges’ “AI” GPU nodes
Partition name | GPU-AI | |
---|---|---|
Volta 16 | DGX-2 | |
Node type | 8 Tesla V100 (Volta) GPUs with 16 GB of GPU memory each 2 20-core CPUs |
16 Tesla V100 GPUs with 32 GB of GPU memory each 2 24-core CPUs |
Node default | 1 | 1 |
Node max | 4 | 1 |
Min GPUs per job | 1 | 1 |
Max GPUs per job | 32 | 16 |
Max GPUs in use per user | 32 | 16 |
Walltime default | 1 hour | 1 hour |
Walltime max | 48 hours | 48 hours |
See also
Node, partition, and job status information
sinfo
The sinfo
command displays information about the state of Bridges’s nodes. The nodes can have several states:
alloc | Allocated to a job |
down | Down |
drain | Not available for scheduling |
idle | Free |
resv | Reserved |
See also
squeue
The squeue
command displays information about the jobs in the partitions. Some useful options are:
-j jobid | Displays the information for the specified jobid |
-u username | Restricts information to jobs belonging to the specified username |
-p partition | Restricts information to the specified partition |
-l | (long) Displays information including: time requested, time used, number of requested nodes, the nodes on which a job is running, job state and the reason why a job is waiting to run. |
See also
- squeue man page for a discussion of the codes for job state, for why a job is waiting to run, and more options.
scancel
The scancel
command is used to kill a job in a partition, whether it is running or still waiting to run. Specify the jobid for the job you want to kill. For example,
scancel 12345
kills job # 12345.
See also
sacct
The sacct
command can be used to display detailed information about jobs. It is especially useful in investigating why one of your jobs failed. The general format of the command is
sacct -X -j nnnnnn -S MMDDYY –format parameter1,parameter2, …
- For ‘nnnnnn‘ substitute the jobid of the job you are investigating.
- The date given for the -S option is the date at which
sacct
begins searching for information about your job. - The commas between the parameters in the –format option cannot be followed by spaces.
The –format option determines what information to display about a job. Useful parameters are
- JobID
- Partition
- Account – the account id
- ExitCode – useful in determining why a job failed
- State – useful in determining why a job failed
- Start, End, Elapsed – start, end and elapsed time of the job
- NodeList – list of nodes used in the job
- NNodes – how many nodes the job was allocated
- MaxRSS – how much memory the job used
- AllocCPUs – how many cores the job was allocated
See also
Monitoring memory usage
It can be useful to find the memory usage of your jobs. For example, you may want to find out if memory usage was a reason a job failed.
You can determine a job’s memory usage whether it is still running or has finished. To determine if your job is still running, use the squeue
command.
squeue -j nnnnnn -O state
where nnnnnn is the jobid.
For running jobs: srun and top or sstat
You can use the srun
and top
commands to determine the amount of memory being used.
srun --jobid=nnnnnn top -b -n 1 | grep userid
For nnnnnn substitute the jobid of your job. For ‘userid‘ substitute your userid. The RES field in the output from top
shows the actual amount of memory used by a process. The top
man page can be used to identify the fields in the output of the top
command.
You can also use the sstat
command to determine the amount of memory being used in a running job:
sstat -j nnnnnn.batch --format=JobID,MaxRss
where nnnnnn is your jobid.
- See the man page for sstat for more information.
For jobs that are finished: sacct or job_info
If you are checking within a day or two after your job has finished you can issue the command
sacct -j nnnnnn --format=JobID,MaxRss
If this command no longer shows a value for MaxRss, use the job_info
command
job_info nnnnnn | grep max_rss
Substitute your jobid for nnnnnn in both of these commands.
- See the man page for sacct for more information.
See also
- Online documentation for SLURM, including man pages for all the SLURM commands