Running Jobs

All production computing must be done on Bridges' compute nodes, NOT on Bridges' login nodes. The SLURM scheduler (Simple Linux Utility for Resource Management) manages and allocates all of Bridges' compute nodes. Several partitions, or job queues, have been set up in SLURM to allocate resources efficiently.

To run a job on Bridges, you need to decide how you want to run: interactively, in batch, or through OnDemand;  and where to run - that is, which partitions you are allowed to use.

What are the different ways to run a job?

 You can run jobs in Bridges in several ways:

  • interactive mode - where you type commands and receive output back to your screen as the commands complete
  • batch mode - where you first create a batch (or job) script which contains the commands to be run, then submit the job to be run as soon as resources are available
  • through OnDemand - a browser interface that allows you to run interactively, or create, edit and submit batch jobs and also provides a graphical interface to tools like RStudio, Jupyter notebooks, and IJulia,  More information about OnDemand is in the OnDemand section of the Bridges User Guide.

Regardless of which way you choose to run your jobs, you will always need to choose a partition to run them in.

Which partitions can I use?

Different partitions control different types of Bridges' resources; they are configured by the type of node they control along with other job requirements like how many nodes or how much time or memory is needed.  Your access to the partitions is based on the type of Bridges allocation that you have ("Bridges regular memory", 'Bridges GPU", "Bridges large memory", or "Bridges-AI"). You may have more than one type of allocation; in that case, you will have access to more than one set of partitions.

In this document

Ways to run a job

Managing multiple grants

Partitions

Node, partition, and job status information

  • sinfo: display information about Bridges' nodes
  • squeue: display information about the SLURM partitions
  • scancel: kill a job
  • sacct: display detailed information about a job. This information can help to determine why a job failed.
  • srun, sstat, sacct and jobinfo: monitor the memory usage of a job

 

Interactive sessions

You can do your production work interactively on Bridges, typing commands on the command line, and getting responses back in real time.  But you must  be allocated the use of one or more Bridges' compute nodes by SLURM to work interactively on Bridges.  You cannot use the Bridges login nodes for your work.

You can run an interactive session in any of the SLURM partitions.  You will need to specify which partition you want, so that the proper resources are allocated for your use.

If all of the resources set aside for interactive use are in use, your request will wait until the resources you need are available. Using a shared partition (RM-shared, GPU-shared) will probably allow your job to start sooner.

The interact command

To start an interactive session, use the command interact.  The format is

interact -options

The simplest interact command is

 interact

This command will start an interactive job using the defaults for interact, which are:

Partition: RM-small
Cores: 1
Time limit: 60 minutes

  

Once the interact command returns with a command prompt you can enter your commands. The shell will be your default shell. When you are finished with your job, type CTRL-D.

[bridgesuser@br006 ~]$ interact

A command prompt will appear when your session begins
"Ctrl+d" or "exit" will end your session

[bridgesuser@r004 ~]$ 

Notes:

  • Be sure to use the correct account id for your job if you have more than one grant. See "Managing multiple grants".
  • Service Units (SU) accrue for your resource usage from the time the prompt appears until you type CTRL-D, so be sure to type CTRL-D as soon as you are done.   
  • The maximum time you can request is 8 hours. Inactive interact jobs are logged out after 30 minutes of idle time.
  • By default, interact uses the RM-small partition.  Use the -p option for interact to use a different partition.

Options for interact 

If you want to run in a different partition, use more than one core or set a different time limit, you will need to use options to the interact command.   Available options are given below.

Options to the interact command
OptionDescriptionDefault value
-p partition Partition requested RM-small
-t HH:MM:SS

Walltime requested 

The maximum time you can request is 8 hours.

60:00 (1 hour)
-N n Number of nodes requested 1

--egress
Note the "--" for this option

Allows your compute nodes to communicate with sites external to Bridges. N/A
-A account id

SLURM account id for the job
Find or change your default account id

Note: Files created during a job will be owned by the Unix group in effect when the job is submitted. This may be different than the account id for the job. See the discussion of the newgrp command in the Account Administration section of this User Guide to see how to change the Unix group currently in effect.

Your default account id
-R reservation-name Reservation name, if you have one
Use of -R does not automatically set any other interact options. You still need to specify the other options (partition, walltime, number of nodes) to override the defaults for the interact command. If your reservation is not assigned to your default account, then you will need to use the -A option when you issue your interact command.
No default
--mem=nGB
Note the "--" for this option

Amount of memory requested in GB. This option should only be used for the LM partition.

No default
--gres=gpu:type:n
Note the "--" for this option

Specifies the type and number of GPUs requested.

'type' is one of: volta32, volta16, p100 or k80. 

For the GPU, GPU-shared and GPU-small partitions, type is either k80 or p100. The default is k80.

For the GPU-AI partition, type is either volta16 or volta32.

'n' is the number of GPUs.  Valid choices are 

  • 1-4, when type=k80
  • 1-2, when type=p100
  • 1-8 when type=volta16
  • 1-16 when type=volta32
No default
-gpu Runs your job on 1 P100 GPU in the GPU-small partition N/A
--ntasks-per-node=n
Note the "--" for this option
Number of cores to allocate per node 1
-h Help, lists all the available command options  N/A

 

See also

 

Batch jobs

To run a batch job, you must first create a batch (or job) script, and then submit the script  using the sbatch command.  

A batch script is a file that consists of SBATCH directives, executable commands and comments.

SBATCH directives specify your resource requests and other job options in your batch script.  You can also specify resource requests and options  on the sbatch command line.  Any options on the command line take precedence over those given in the batch script. The SBATCH directives must start with '#SBATCH' as the first text on a line, with no leading spaces.

Comments begin with a '#' character.

The first line of any batch script must indicate the shell to use for your batch job.  

Check the Sample batch scripts section of the Bridges User Guide to see examples of batch scripts.

The sbatch command

To submit a batch job, use the sbatch command.  The format is

sbatch -options batch-script

The options to sbatch can either be in your batch script or on the sbatch command line.  Options in the command line override those in the batch script.

Note:

  • Be sure to use the correct account id your job if you have more than one grant. See the -A option for sbatch to change the SLURM account id for a job. Information on how to determine your valid account ids and change your default account id is in the Account adminstration section of the Bridges User Guide.
  • In some cases, the options for sbatch differ from the options for interact or srun.
  • By default, sbatch submits jobs to the RM partition.  Use the -p option for sbatch to direct your job to a different partition.

 

Options to the sbatch command

For more information about these options and other useful sbatch options see the sbatch man page

Options to the sbatch command
OptionDescriptionDefault
-p partition Partition requested RM
-t HH:MM:SS Walltime requested in HH:MM:SS 30 minutes
-N n Number of nodes requested. 1
-A account id

SLURM account id for the job. If not specified, your default account id is used.  Find your default SLURM account id.

Note: Files created during a job will be owned by the Unix group in effect when the job is submitted. This may be different than the account id used by the job. See the discussion of the newgrp command in the Account Administration section of this User Guide to see how to change the Unix group currently in effect.

Your default account id

--res reservation-name
Note the "--" for this option

Use the reservation that has been set up for you.  Use of --res does not automatically set any other options. You still need to specify the other options (partition, walltime, number of nodes) that you would in any sbatch command.  If your reservation is not assigned to your default account then you will need to use the -A option to sbatch to specify the account. NA

--mem=nGB
Note the "--" for this option

Memory in GB.  This option is only valid for the LM partition. None
-C constraints

Specifies constraints which the nodes allocated to this job must satisfy.

An sbatch command can have only one -C option. Multiple constraints can be specified with "&". For example, -C LM&PH2 constrains the nodes to 3TB nodes with 20 cores and 38.5GB/core.   If mutilple -C commands are given, (e.g., sbatch ..... -C LM -C EGRESS) only the last applies.  The -C LM option will be ignored in this example.

Some valid constraints are:

EGRESS
Allows your compute nodes to communicate with sites external to Bridges
LM
Ensures that a job in the LM partition uses only the 3TB nodes. This option is required for any jobs in the LM partition which use /pylon5.
PH1
Ensures that the job will run on LM nodes which have 16 cores and 48GB/core
PH2
Ensures that the job will run on LM nodes which have 20 cores and 38.5GB/core
PERF
Turns on performance profiling. For use with performance profiling software like VTune, TAU

 See the discussion of the -C option in the sbatch man page for more information.

None

--gres=gpu:type:n
Note the "--" for this option

Specifies the type and number of GPUs requested.

'type' is one of: volta32, volta16, p100 or k80. 

For the GPU, GPU-shared and GPU-small partitions, type is either k80 or p100. The default is k80.

For the GPU-AI partition, type is either volta16 or volta32

'n' is the number of GPUs.  Valid choices are 

  • 1-4, when type=k80
  • 1-2, when type=p100
  • 1-8 when type=volta16
  • 1-16 when type=volta32
None

--ntasks-per-node=n
Note the "--" for this option

Request n cores be allocated per node.  1

--mail-type=type
Note the "--" for this option

Send email when job events occur, where type can be BEGIN, END, FAIL or ALL.   None

--mail-user=user
Note the "--" for this option

User to send email to as specified by -mail-type. Default is the user who submits the job.  None
-d=dependency-list Set up dependencies between jobs, where dependency-list can be:
after:job_id[:jobid...]
This job can begin execution after the specified jobs have begun execution.
afterany:job_id[:jobid...]
This job can begin execution after the specified jobs have terminated.
aftercorr:job_id[:jobid...]
A task of this job array can begin execution after the corresponding task ID in the specified job has completed successfully (ran to completion with an exit code of zero).
afternotok:job_id[:jobid...]
This job can begin execution after the specified jobs have terminated in some failed state (non-zero exit code, node failure, timed out, etc).
afterok:job_id[:jobid...]
This job can begin execution after the specified jobs have successfully executed (ran to completion with an exit code of zero).
singleton
This job can begin execution after any previously launched jobs sharing the same job name and user have terminated.
None
--no-requeue
Note the "--" for this option
Specifies that your job will be not be requeued under any circumstances. If your job is running on a node that fails it will not be restarted. Note the "--" for this option. NA

--time-min=HH:MM:SS
Note the "--" for this option.

Specifies a minimum walltime for your job in HH:MM:SS format.

SLURM considers the walltime requested when deciding which job to start next. Free slots on the machine are defined by the number of nodes and how long those nodes are free until they will be needed by another job. By specifying a minimum walltime you allow the scheduler to reduce your walltime request to your specified minimum time when deciding whether to schedule your job. This could allow your job to start sooner.

If you use this option your actual walltime assignment can vary between your minimum time and the time you specified with the -t option. If your job hits its actual walltime limit, it will be killed. When you use this option you should checkpoint your job frequently to save the results obtained to that point.

None

--switches=1
--switches=1@HH:MM:SS
Note the "--" for this option

Requests that the nodes your job runs on all be on one switch, which is a hardware grouping of 42 nodes.

If you are asking for more than 1 and fewer than 42 nodes, your job will run more efficiently if it runs on one switch.  Normally switches are shared across jobs, so using the switches option means your job may wait longer in the queue before it starts.

The optional time parameter gives a maximum time that your job will wait for a switch to be available. If it has waited this maximum time, the request for your job to be run on a switch will be cancelled.

NA
-h Help, lists all the available command options  

 

See also

 

Managing multiple grants

If you have more than one grant, be sure to use the correct SLURM account id and Unix group when running jobs. 

See "Managing multiple grants" in the Account Administration section of the Bridges User Guide to see how to find your account ids and Unix groups and determine or change your defaults.

Permanently change your default SLURM account id and Unix group

See the change_primary_group command in the "Managing multiple grants" in the Account Administration section of the Bridges User Guide to permanently change your default SLURM account id and Unix group.

Temporarily change your SLURM account id or Unix group

See the -A option to the sbatch or interact commands to set the SLURM account id for a specific job.

The newgrp command will change your Unix group for that login session only. Note that any files created by a job are owned by the Unix group in effect when the job is submitted, which is not necessarily the same as the account id used for the job.  See the newgrp command in the Account Administration section of the Bridges User Guide to see how to change the Unix group currently in effect.

 

Bridges partitions

Each SLURM partition manages a subset of Bridges' resources.  Each partition allocates resources to interactive sessions, batch jobs, and OnDemand sessions that request resources from it.  

Know which partitions are open to you: Your Bridges allocations determine which partitions you can submit jobs to.

  • A "Bridges regular memory" allocation allows you to use Bridges' RSM (128GB) nodes. Partitions available to "Bridges regular memory" allocations are
    • RM, for jobs that will run on Bridges' RSM (128GB) nodes, and use one or more full nodes
    • RM-shared, for jobs that will run on Bridges' RSM (128GB) nodes, but share a node with other jobs
    • RM-small, for short jobs needing 2 full nodes or less, that will run on Bridges RSM (128GB) nodes
  • A "Bridges GPU" allocation allows you to use Bridges' GPU nodes. Partitions available to "Bridges GPU" allocations are:
    • GPU, for jobs that will run on Bridges' GPU nodes, and use one or more full nodes
    • GPU-shared, for jobs that will run on Bridges' GPU nodes, but share a node with other jobs
    • GPU-small, for jobs that will use only one of Bridges' GPU nodes and 8 hours or less of wall time.
  • A "Bridges large memory" allocation allows you to use  Bridges LSM and ESM (3TB and 12TB) nodes. There is one partition available to "Bridges large memory" allocations:
    • LM, for jobs that will run on Bridges' LSM and ESM (3TB and 12TB) nodes
  • A "Bridges-AI" allocation allows you to you Bridges' Volta GPU nodes. There is one partition available to "Bridges-AI" allocations:
    • GPU-AI, for jobs that will run on Bridges' Apollo 6500 servers or the DGX-2.

All the partitions use FIFO scheduling. If the top job in the partition will not fit, SLURM will try to schedule the next job in the partition. The scheduler follows policies to ensure that one user does not dominate the machine. There are also limits to the number of nodes and cores a user can simultaneously use. Scheduling policies are always under review to ensure best turnaround for users.

Partitions for "Bridges regular memory" allocations

There a three partitions available for "Bridges regular memory allocations": RM, RM-shared and RM-small.

Use your allocation wisely:  To make the most of your allocation, use the shared partitions whenever possible.  Jobs in the RM partition use of all the cores on a node, and incur Service Units (SU) for all 28 cores.  Jobs in the RM-shared partition share nodes, and SUs accrue only for the number of cores they are allocated. The RM partition is the default for the sbatch command, while RM-small is the default for the interact command. See the discussion of the interact and sbatch commands in this document for more information.

Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job.  See "Managing multiple grants".

For information on requesting resources and submitting  jobs see the discussion of the interact or sbatch commands.

 

RM partition

Jobs in the RM partition run on Bridges' RSM (128GB) nodes.  Jobs do not share nodes, and are allocated all 28 of the cores on each of the nodes assigned to them.  A job in the RM partition incurs SUs for all 28 cores per node on its assigned nodes. 

RM jobs can use more than one node. However, the memory space of  all the nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.

The internode communication performance for jobs in the RM partition is best when using 42 or fewer nodes. 

When submitting a job to the RM partition, you should specify:

  • the number of  nodes
  • the walltime limit 

Sample interact command for the RM partition

An example of an interact command for the RM partition, requesting the use of 2 nodes for 30 minutes is

interact -p RM -N 2 -t 30:00

where:

-p indicates the intended partition

-N is the number of nodes requested

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for RM partition

An example of a sbatch command to submit a job to the RM partition, requesting one node for 5 hours is

sbatch -p RM -t 5:00:00 -N 1 myscript.job

where:

-p indicates the intended partition

-t is the walltime requested in the format HH:MM:SS

-N is the number of nodes requested

myscript.job is the name of your batch script

 

RM-shared partition

Jobs in the RM-shared partition run on (part of) a Bridges' RSM (128GB) node.  Jobs will share a node with other jobs, but will not share cores.   A job in the RM-shared partition will accrue SUs only for the cores allocated to it, so it will use fewer SUs than a RM job.  It could also start running sooner.

RM-shared jobs are assigned memory in proportion to the number of requested cores.   They get the fraction of the node's total memory in proportion to the number of cores requested. If the job exceeds this amount of memory it will be killed.

When submitting a job to the RM-shared partition, you should specify:

  • the number of cores
  • the walltime limit

Sample interact command for the RM-shared partition

Run in the RM-shared partition using 4 cores and 1 hour of walltime. 

interact -p RM-shared --ntasks-per-node=4 -t 1:00:00

where:

-p indicates the intended partition

--ntasks-per-node requests the use of 4 cores

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for the RM-shared partition

Submit a job to RM-shared asking for 2 cores and 5 hours of walltime.

sbatch -p RM-shared --ntasks-per-node 2 -t 5:00:00 myscript.job

where:

-p indicates the intended partition

--ntasks-per-node requests the use of 2 cores

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Sample batch script for RM-shared partition

#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH -t 5:00:00
#SBATCH --ntasks-per-node 2

#echo commands to stdout
set -x

# move to working directory
# this job assumes:
# - all input data is stored in this directory 
# - all output should be stored in this directory

cd /pylon5/groupname/username/path-to-directory

#run OpenMP program
export OMP_NUM_THREADS 2
./myopenmp

Notes: For groupname, username, and path-to-directory substitute your Unix group, username, and directory path.

RM-small

 

Jobs in the RM-small partition run on Bridges' RSM (128GB) nodes, but are limited to at most 2 full nodes and 8 hours.  Jobs can share nodes.  Note that the memory space of  all the nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not. When submitting a job to the RM-small partition, you should specify:

  • the number of nodes
  • the number of cores
  • the walltime limit

Sample interact command for the RM-small partition

Run in the RM-small partition using one node,  8 cores and 45 minutes of walltime. 

interact -p RM-small -N 1 --ntasks-per-node=8 -t 45:00

where:

-p indicates the intended partition

-N requests one node

--ntasks-per-node requests the use of 8 cores

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for the RM-small partition

Submit a job to RM-small asking for 2 nodes and 6 hours of walltime.

sbatch -p RM-small -N 2 -t 6:00:00 myscript.job

where:

-p indicates the intended partition

-N requests the use of 2 nodes

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Summary of partitions for Bridges regular memory nodes

Partition nameRMRM-sharedRM-small
Node type 128GB
28 cores
8TB on-node storage
128GB
28 cores
8TB on-node storage
128GB
28 cores
8TB on-node storage
Nodes shared? No Yes Yes
Node default 1 1 1
Node max 168
If you need more than 168, contact bridges@psc.edu to make special arrangements.
1 2
Core default 28/node 1 1
Core max 28/node 28 28/node
Walltime default 30 mins 30 mins 30 mins
Walltime max 48 hrs 48 hrs 8 hrs
Memory 128GB/node 4.5GB/core 4.5GB/core

 

See also

Partitions for "Bridges GPU" allocations

There are three partitions available for "Bridges GPU" allocations: GPU, GPU-shared and GPU-small.

Use your allocation wisely:  To make the most of your allocation, use the shared partitions whenever possible.  Jobs in the GPU partition use all of the cores on a node, and accrue SU costs for every coew. Jobs in the GPU-shared partition share nodes, and only incur SU cost for the number of cores they are allocated.

Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job.  See "Managing multiple grants".

For information on requesting resources and submitting  jobs see the interact or sbatch commands.

 

GPU partition

Jobs in the GPU partition use Bridges' GPU nodes.  Note that Bridges has 2 types of GPU nodes: K80s and P100s.  See the System Configuration section of this User Guide for the details of each type.

Jobs in the GPU partition do not share nodes, so jobs are allocated all the cores and all of the GPUs associated with the nodes assigned to them . Your job will incur SU costs for all of the cores on your assigned nodes.

The memory space across nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.

When submitting a job to the GPU partition, you must use the --gres=type option to specify

  • the type of node you want, K80 or P100. K80 is the default if no type is specified.  
  • the number of GPUs you want

See the sbatch command options for more details on the --gres=type option.

You should also specify:

  • the number of nodes
  • the walltime limit 

Sample interact command for GPU

An interact command to start a GPU job on 4 P100 nodes for 30 minutes is

 interact -p GPU --gres=gpu:p100:2 -N 4 -t 30:00

where:

-p indicates the intended partition

--gres=gpu:p100:2  requests the use of 2 P100 GPUs

-N requests 4 nodes

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for GPU

This command requests the use of one K80 GPU node for 45 minutes;

sbatch -p GPU --gres=gpu:k80:4 -N 1 -t 45:00 myscript.job

where:

-p indicates the intended partition

--gres=gpu:k80:4  requests the use of 4 K80 GPUs

-N requests one node

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Sample batch script for GPU partition

#!/bin/bash
#SBATCH -N 2
#SBATCH -p GPU
#SBATCH --ntasks-per-node 28
#SBATCH -t 5:00:00
#SBATCH --gres=gpu:p100:2

#echo commands to stdout
set -x

#move to working directory
# this job assumes:
# - all input data is stored in this directory
# - all output should be stored in this directory cd /pylon5/groupname/username/path-to-directory

#run GPU program ./mygpu

Notes:The value of the --gres-gpu option indicates the type and number of GPUs you want.For groupname, username and path-to-directory you must substitute your Unix group, username and appropriate directory path.

 

GPU-shared partition

Jobs in the GPU-shared partition run on Bridges's GPU nodes.  Note that Bridges has 2 types of GPU nodes: K80s and P100s.  See the System Configuration section of this User Guide for the details of each type.

Jobs in the GPU-shared partition share nodes, but not cores. By sharing nodes your job will use fewer Service Units.  It could also start running sooner.

You will always run on (part of) one node in the GPU-shared partition.Your jobs will be allocated memory in proportion to the number of requested GPUs. You get the fraction of the node's total memory in proportion to the fraction of GPUs you requested. If your job exceeds this amount of memory it will be killed.

When submitting a job to the GPU-shared partition, you must specify the number of GPUs.  You should also specify:

  • the type of GPU node you want, K80or P100, with the --gres=type option to the interact or sbatch commands.  K80 is the default if no type is specified.  See the sbatch command options for more details.
  • the walltime limit

Sample interact command for GPU-shared

Run in the GPU-shared partition and ask for 4 K80 GPUs and 8 hours of wall time.

interact -p GPU-shared --gres=gpu:k80:4 -t 8:00:00

where:

-p indicates the intended partition

--gres=gpu:k80:4  requests the use of 4 K80 GPUs

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for GPU-shared

Submit a job to the GPU-shared partition requesting 2 P100 GPUs and 1 hour of wall time.

sbatch -p GPU-shared --gres=gpu:P100:2 -t 1:00:00 myscript.job

where:

-p indicates the intended partition

--gres=gpu:P100:2  requests the use of 2 P100 GPUs

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Sample batch script for GPU-shared partition

#!/bin/bash
#SBATCH -N 1
#SBATCH -p GPU-shared
#SBATCH --ntasks-per-node 7 #SBATCH --gres=gpu:p100:1 #SBATCH -t 5:00:00
#echo commands to stdout set -x
#move to working directory # this job assumes:
# - all input data is stored in this directory
# - all output should be stored in this directory
cd /pylon5/groupname/username/path-to-directory

#run GPU program ./mygpu

Notes:The option --gres-gpu indicates the number and type of GPUs you want.For groupname, username and path-to-directory you must substitute your Unix group, username, and appropriate directory path.

 

GPU-small

Jobs in the GPU-small partition run on one of Bridges' P100 GPU nodes.  Your jobs will be allocated memory in proportion to the number of requested GPUs. You get the fraction of the node's total memory in proportion to the fraction of GPUs you requested. If your job exceeds this amount of memory it will be killed.

When submitting a job to the GPU-small partition, you must specify the number of GPUs with the --gres=gpu:p100:n  option to the interact or sbatch command.  In this partition, n can be 1 or 2.  You should also specify the walltime limit.

 

Sample interact command for GPU-small

Run in the GPU-small partition and ask for 2 P100 GPUs and 2  hours of wall time.

interact -p GPU-small --gres=gpu:p100:2 -t 2:00:00

where:

-p indicates the intended partition

--gres=gpu:P100:2  requests the use of 2 P100 GPUs

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for GPU-small

Submit a job to the GPU-small partition using 2 P100 GPUs and 1 hour of wall time.

sbatch -p GPU-small --gres=gpu:p100:2 -t 1:00:00 myscript.job

where:

-p indicates the intended partition

--gres=gpu:P100:2  requests the use of 2 P100 GPUs

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Summary of partitions for Bridges GPU nodes

Partition nameGPUGPU-sharedGPU-small
 P100 nodesK80 nodesP100 nodesK80 nodesP100 nodes
Node type 2 GPUs
2 16-core CPUs
8TB on-node storage
4 GPUs
2 14-core CPUS
8TB on-node storage
2 GPUs
2 16-core CPUs
8TB on-node storage
4 GPUs
2 14-core CPUs
8TB on-node storage
2 GPUs
2 16-core CPUs
8TB on-node storage
Nodes shared? No No Yes Yes No
Node default 1 1 1 1 1
Node max 4
Limit of 8 GPUs/job.
Because there are 2 GPUs on each P100 node, you can request at most 4 nodes.
2
Limit of 8 GPUs/job.
Because there are 4 GPUs on each K80 node, you can request at most 2 nodes.
1 1 1
Core default 32/node 28/node 16/GPU 7/GPU 32/node
Core max 32/node 28/node 16/GPU 7/GPU 32/node
GPU default 2/node 4/node No default No default No default
GPU max 2/node 4/node 2 4 2
Walltime default 30 mins 30 mins 30 mins 30 mins 30 mins
Walltime max 48 hrs 48 hrs 48 hrs 48 hrs 8 hrs
Memory 128GB/node 128GB/node 7GB/GPU 7GB/GPU 128GB/node

 

See also

Partitions for "Bridges large memory" allocations

There is one partition available for "Bridges large memory" allocations: LM.

Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job.  See "Managing multiple grants".

For information on requesting resources and submitting  jobs see the interact or sbatch commands.

 

LM partition

Jobs in the LM partition share nodes. Your memory space for an LM job is an integrated, shared memory space.

When submitting a job to the LM partition, you must 

  • use the --mem option to request the amount of memory you need, in GB. Any value up to 12000GB can be requested There is no default memory value.  Each core on the 3TB and 12TB nodes is associated with a fixed amount of memory, so the amount of memory you request determines the number of cores assigned to your job. 
  • specify the walltime limit

You cannot:

  • specifically request a number of cores 

SLURM will place jobs on either a 3TB or a 12TB node based on the memory request.  Jobs asking for 3000GB or less will run on a 3TB node.  If no 3TB nodes are available but a 12TB node is available, those jobs will run on a 12TB node.

Once your job is running, the environment variable SLURM_NTASKS tells you the number of cores assigned to your job.

Sample interact command for LM

Run in the LM partition and request 2TB of memory. Use the wall time default of 30 minutes.

interact -p LM --mem=2000GB

where:

-p indicates the intended partition (LM)

--mem is the amount of memory requested

Sample sbatch command for the LM partition

A sample sbatch command for the LM partition requesting 10 hours of wall time and 6TB of memory is: 

sbatch -p LM - t 10:00:00 --mem 6000GB myscript.job

where:

-p indicates the intended partition (LM)

-t is the walltime requested in the format HH:MM:SS

--mem is the amount of memory requested

myscript.job is the name of your batch script

 

Summary of partition for Bridges large memory nodes

Partition nameLM
 LSM nodesESM nodes
Node type 3TB RAM
16TB on-node storage
12TB RAM
64TB on-node storage
Nodes shared? Yes Yes
Node default 1 1
Node max 8 4
Cores Jobs are allocated 1 core/48GB of memory requested. Jobs are allocated 1 core/48GB of memory requested.
Walltime default 30 mins 30 mins
Walltime max 14 days 14 days
Memory Up to 3000GB Up to 12,000GB

 

See also

 

Partition for "Bridges-AI" allocations

There is one partition available for "Bridges-AI" allocations: GPU-AI.

Use the appropriate account id for your jobs: If you have more than one Bridges grant, be sure to use the correct SLURM account id for each job.  See "Managing multiple grants".

For information on requesting resources and submitting  jobs see the interact or sbatch commands.

 

GPU-AI partition

When submitting a job to the GPU-AI partition, you must use the --gres option to specify the type and number of Volta GPUs you will use.  The valid options for type are

  • volta16, for the Voltas in the Apollo servers.  Those have 16GB of GPU memory.  You can ask for 1 to 8 GPUs. 
  • volta32, for the DGX-2.  The GPUs in the DGX-2 have 32GB of GPU memory.

See the sbatch command for an explanation of the --gres option.

Sample interact command for GPU-AI

To run in an interactive session on Bridges-AI, use the interact command and specify the GPU-AI partition. An example interact command to request 1 GPU on an Apollo 6500 node is:

interact -p GPU-AI --gres=gpu:volta16:1

Where:

-p indicates the intended partition

--gres=gpu:volta16:1 requests the use of 1 V100 GPU on an Apollo node

Once your interactive session has started, you can run the Singularity image.

 Sample sbatch command for the GPU-AI partition

A sample sbatch command to submit a job to run on one of the Apollo servers and use all eight GPUs would be

sbatch -p GPU-AI -N 1 --gres=gpu:volta16:8 -t 1:00:00 myscript.job

where 

-p GPU-AI requests the GPU-AI partition

-N 1 requests one node

--gres=gpu:volta16:8 requests an Apollo server with Volta 100 GPUs , and specifies you will use all 8 GPUs on that node

-t 1:00:00 requests one hour of running time

myscript.job is your batch script.

 

 

Summary of the partition for Bridges' Volta GPUs

Partition nameGPU-AI
 Apollo 6500 nodesDGX-2
Node type Tesla V100 (Volta) GPUs with 16 GB of GPU memory Tesla V100 GPUs with 32 GB of GPU memory
Node default 1 1
Node max 1 1
Min GPUs per job 1 1
Max GPUs per job 8 16
Max GPUs in use per user 8 16
Walltime default 1 hour 1 hour
Walltime max 12 hours 8 hours

See also

 

Node, partition, and job status information

sinfo

The sinfo command displays information about the state of Bridges's nodes. The nodes can have several states:

alloc Allocated to a job
down Down
drain Not available for scheduling
idle Free
resv Reserved

See also

squeue

The squeue command displays information about the jobs in the partitions. Some useful options are:

-j jobid Displays the information for the specified jobid
-u username Restricts information to jobs belonging to the specified username
-p partition Restricts information to the specified partition
-l (long) Displays information including:  time requested, time used, number of requested nodes, the nodes on which a job is running, job state and the reason why a job is waiting to run.

See also

      • squeue man page for a discussion of the codes for job state, for why a job is waiting to run, and more options.

 

scancel

The scancel command is used to kill a job in a partition, whether it is running or still waiting to run.  Specify the jobid for the job you want to kill.  For example,

scancel 12345

kills job # 12345.

See also

sacct

The sacct command can be used to display detailed information about jobs. It is especially useful in investigating why one of your jobs failed. The general format of the command is

    sacct -X -j nnnnnn -S MMDDYY --format parameter1,parameter2, ...

      • For 'nnnnnn' substitute the jobid of the job you are investigating.
      • The date given for the -S option is the date at which sacct begins searching for information about your job. 
      • The commas between the parameters in the --format option cannot be followed by spaces.

The --format option determines what information to display about a job. Useful parameters are

      • JobID
      • Partition
      • Account - the account id
      • ExitCode - useful in determining why a job failed
      • State - useful in determining why a job failed
      • Start, End, Elapsed - start, end and elapsed time of the job
      • NodeList - list of nodes used in the job
      • NNodes - how many nodes the job was allocated
      • MaxRSS - how much memory the job used
      • AllocCPUs - how many cores the job was allocated

 See also

Monitoring memory usage

It can be useful to find the memory usage of your jobs. For example, you may want to find out if memory usage was a reason a job failed.

You can determine a job's memory usage whether it is still running or has finished. To determine if your job is still running, use the squeue command.

squeue -j nnnnnn -O state

where nnnnnn is the jobid.

For running jobs: srun and top or sstat

You can use the srun and top commands to determine the amount of memory being used.

srun --jobid=nnnnnn top -b -n 1 | grep userid

For nnnnnn substitute the jobid of your job. For 'userid' substitute your userid. The RES field in the output from top shows the actual amount of memory used by a process. The top man page can be used to identify the fields in the output of the top command.

      • See the man pages for srun and top for more information.

You can also use the sstat command to determine the amount of memory being used in a running job 

sstat -j nnnnnn.batch --format=JobID,MaxRss

where nnnnnn is your jobid.

For jobs that are finished: sacct or job_info

If you are checking within a day or two after your job has finished you can issue the command

sacct -j nnnnnn --format=JobID,MaxRss

If this command no longer shows a value for MaxRss, use the job_info command

job_info nnnnnn | grep max_rss

Substitute your jobid for nnnnnn in both of these commands.

  

See also

 

 

System Status

  • Bridges is Up

     

      Bridges is running normally.

New on Bridges

The default version of Singularity is now 3.0.0.
Read more

The GPU limit per job is now set to 8.
Read more

Omni-Path User Group

The Intel Omni-Path Architecture User Group is open to all interested users of Intel's Omni-Path technology.

More information on OPUG