Running Jobs

All production computing must be done on Bridges' compute nodes, NOT on Bridges' login nodes. The SLURM scheduler (Simple Linux Utility for Resource Management) manages and allocates all of Bridges' compute nodes. Several partitions, or job queues, have been set up in SLURM to allocate resources efficiently.

To run a job on Bridges, you need to decide how you want to run: interactively, in batch, or through OnDemand;  and where to run - that is, which partitions you are allowed to use.

What are the different ways to run a job?

 You can run jobs in Bridges in several ways:

  • interactive mode - where you type commands and receive output back to your screen as the commands complete
  • batch mode - where you first create a batch (or job) script which contains the commands to be run, then submit the job to be run as soon as resources are available
  • through OnDemand - a browser interface that allows you to run interactively, or create, edit and submit batch jobs and also provides a graphical interface to tools like RStudio, Jupyter notebooks, and IJulia,  More information about OnDemand is in the OnDemand section of the Bridges User Guide.

Regardless of which way you choose to run your jobs, you will always need to choose a partition to run them in.

Which partitions can I use?

Different partitions control different types of Bridges' resources; they are configured by the type of node they control along with other job requirements like how many nodes or how much time or memory is needed.  Your access to the partitions is based on the type of Bridges allocation that you have ("Bridges regular memory", 'Bridges GPU", or "Bridges large memory"). You may have more than one type of allocation; in that case, you will have access to more than one set of partitions.

In this document

Ways to run a job

Managing charging with multiple grants

Partitions

Node, partition, and job status information

  • sinfo: display information about Bridges' nodes
  • squeue: display information about the SLURM partitions
  • scancel: kill a job
  • sacct: display detailed information about a job. This information can help to determine why a job failed.
  • srun, sstat, sacct and jobinfo: monitor the memory usage of a job

 

Interactive sessions

You can do your production work interactively on Bridges, typing commands on the command line, and getting responses back in real time.  But you must  be allocated the use of one or more Bridges' compute nodes by SLURM to work interactively on Bridges.  You cannot use the Bridges login nodes for your work.

You can run an interactive session in any of the SLURM partitions.  You will need to specify which partition you want, so that the proper resources are allocated for your use.

If all of the resources set aside for interactive use are in use, your request will wait until the resources you need are available. Using a shared partition (RM-shared, GPU-shared) will probably allow your job to start sooner.

The interact command

To start an interactive session, use the command interact.  The format is

interact -options

The simplest interact command is

 interact

This command will start an interactive job using the defaults for interact, which are:

Partition: RM-small
Cores: 1
Time limit: 60 minutes

  

Once the interact command returns with a command prompt you can enter your commands. The shell will be your default shell. When you are finished with your job, type CTRL-D.

[bridgesuser@br006 ~]$ interact

A command prompt will appear when your session begins
"Ctrl+d" or "exit" will end your session

[bridgesuser@r004 ~]$ 

Notes:

  • Be sure to charge your job to the correct group if you have more than one grant. See "Managing charging with multiple grants".
  • You are charged for your resource usage from the prompt appears until you type CTRL-D, so be sure to type CTRL-D as soon as you are done.   
  • The maximum time you can request is 8 hours. Inactive interact jobs are logged out after 30 minutes of idle time.
  • By default, interact uses the RM-small partition.  Use the -p option for interact to use a different partition.

Options for interact 

If you want to run in a different partition, use more than one core or set a different time limit, you will need to use options to the interact command.   Available options are given below.

Options to the interact command
OptionDescriptionDefault value
-p partition Partition requested RM-small
-t HH:MM:SS

Walltime requested 

The maximum time you can request is 8 hours.

60:00 (1 hour)
-N n Number of nodes requested 1

--egress
Note the "--" for this option

Allows your compute nodes to communicate with sites external to Bridges. N/A
-A groupname

Group to charge the job to
Find or change your default group

Note: Files created during a job will be owned by the group in effect when the job is submitted. This may be different than the group the job is charged to. See the discussion of the newgrp command in the Account Administration section of this User Guide to see how to change the group currently in effect.

Your default group
-R reservation-name Reservation name, if you have one
Use of -R does not automatically set any other interact options. You still need to specify the other options (partition, walltime, number of nodes) to override the defaults for the interact command. If your reservation is not assigned to your default account, then you will need to use the -A option when you issue your interact command.
No default
--mem=nGB
Note the "--" for this option

Amount of memory requested in GB. This option should only be used for the LM partition.

No default
--gres=gpu:type:n
Note the "--" for this option

'type' is either p100 or k80. The default is k80.

'n' is the number of GPUs.  Valid choices are 

  • 1-4, when type=k80
  • 1-2, when type=p100
No default
-gpu Runs your job on 1 P100 GPU in the GPU-small partition N/A
--ntasks-per-node=n
Note the "--" for this option
Number of cores to allocate per node 1
-h Help, lists all the available command options  N/A

 

See also

 

Batch jobs

To run a batch job, you must first create a batch (or job) script, and then submit the script  using the sbatch command.  

A batch script is a file that consists of SBATCH directives, executable commands and comments.

SBATCH directives specify your resource requests and other job options in your batch script.  You can also specify resource requests and options  on the sbatch command line.  Any options on the command line take precedence over those given in the batch script. The SBATCH directives must start with '#SBATCH' as the first text on a line, with no leading spaces.

Comments begin with a '#' character.

The first line of any batch script must indicate the shell to use for your batch job.  

 

Sample batch scripts

Some sample scripts are given here.  Note that:

Each script uses the bash shell, indicated by the first line '!#/bin/bash'.  If you use a different shell some Unix commands will be different.

For username and groupname you must substitute your username and your appropriate group.

 

#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM
#SBATCH --ntasks-per-node 28 #SBATCH -t 5:00:00
# echo commands to stdout set -x
# move to your appropriate pylon5 directory
# this job assumes:
# - all input data is stored in this directory # - all output should be stored in this directory
cd /pylon5/groupname/username/path-to-directory
# run OpenMP program export OMP_NUM_THREADS=28 ./myopenmp

Notes:

        The --ntasks-per-node option indicates that you will use all 28 cores.

For groupname, username, and path-to-directory you must substitute your group, username, and appropriate directory path.

 

#!/bin/bash
#SBATCH -p RM
#SBATCH -t 5:00:00
#SBATCH -N 2
#SBATCH --ntasks-per-node 28
#echo commands to stdout set -x #move to your appropriate pylon5 directory cd /pylon5/groupname/username/path-to-directory
#set variable so that task placement works as expected
export  I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0

#copy input files to LOCAL file storage srun -N $SLURM_JOB_NUM_NODES --ntasks-per-node=1 \
sh -c 'cp path-to-directory/input.${SLURM_PROCID} $LOCAL'
#run MPI program mpirun -np $SLURM_NTASKS ./mympi
# Copy output files to pylon5 srun -N $SLURM_JOB_NUM_NODES --ntasks-per-node=1 \
sh -c 'cp $LOCAL/output.* /pylon5/groupname/username/path-to-directory'

Notes:

The variable $SLURM_NTASKS gives the total number of cores requested in a job. In this example $SLURM_NTASKS will be 56 because  the -N option requested 2 nodes and the --ntasks-per-node option requested all 28 cores on each node.

The export command sets I_MPI_JOB_RESPECT_PROCESS_PLACEMENT so that your task placement settings are effective. Otherwise, the SLURM defaults are in effect.

The srun commands are used to copy files between pylon5 and the $LOCAL file systems on each of your nodes.

The first srun command assumes you have two files named input.0 and input.1 in your pylon5 file space. It will copy input.0 and input.1 to, respectively, the $LOCAL file systems on the first and second nodes allocated to your job.

The second srun command will copy files named output.* back from your $LOCAL file systems to your pylon5 file space before your job ends. In this command '*' functions as the usual Unix wildcard.

For groupname, username, and path-to-directory you must substitute your group, username, and appropriate directory path.

 

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=14
#SBATCH --time=00:10:00
#SBATCH --job-name=hybrid
cd $SLURM_SUBMIT_DIR

mpiifort -xHOST -O3 -qopenmp -mt_mpi hello_hybrid.f90 -o hello_hybrid.exe
# set variable so task placement works as expected export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0
mpirun -print-rank-map -n $SLURM_NTASKS -genv \
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK -genv I_MPI_PIN_DOMAIN=omp \
./hello_hybrid.exe

Notes:

This example asks for 2 nodes, 4 MPI tasks and 14 OpenMP threads per MPI task.

The export command sets I_MPI_JOB_RESPECT_PROCESS_PLACEMENT so that your task placement settings are effective. Otherwise, the SLURM defaults are in effect.

 

#!/bin/bash
#SBATCH -t 05:00:00
#SBATCH -p RM-shared
#SBATCH -N 1
#SBATCH --ntasks-per-node 5
#SBATCH --array=1-5

set -x

./myexecutable $SLURM_ARRAY_TASK_ID

Notes:

The above job will generate five jobs that will each run on a separate core on the same node. The value of the variable SLURM_ARRAY_TASK_ID is the core number, which, in this example, will range from 1 to 5. Good candidates for job array jobs are jobs that can use only this core index to determine the different processing path for each job. For more information about job array jobs see the sbatch man page and the online SLURM documentation.

 

#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH -t 05:00:00
#SBATCH --ntasks-per-node 14
 
echo SLURM NTASKS: $SLURM_NTASK
i=0
while [ $i -lt $SLURM_NTASKS ]
do
numactl -C +$i ./run.sh &
let i=i+1
done
wait # IMPORTANT: wait for all to finish or get killed

Notes:

Bundling or packing multiple jobs in a single job can improve your turnaround and improve the performance of the SLURM scheduler.

 

#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH -t 05:00:00
#SBATCH --ntasks-per-node 14
#SBATCH --cpus-per-task 2

echo SLURM NTASKS: $SLURM_NTASKS
i=0
while [ $i -lt $SLURM_NTASKS ]
do
numactl -C +$i ./run.sh &
let i=i+1
done wait # IMPORTANT: wait for all to finish or get killed

Notes:

Bundling or packing multiple jobs in a single job can improve your turnaround and improve the performance of the SLURM scheduler.

 

 

The sbatch command

To submit a batch job, use the sbatch command.  The format is

sbatch -options batch-script

The options to sbatch can either be in your batch script or on the sbatch command line.  Options in the command line override those in the batch script.

Note:

  • Be sure to charge your job to the correct group if you have more than one grant. See the -A option for sbatch to change the charging group for a job. Information on how to determine your valid groups and change your default group is in the Account adminstration section of the Bridges User Guide.
  • In some cases, the options for sbatch differ from the options for interact or srun.
  • By default, sbatch submits jobs to the RM partition.  Use the -p option for sbatch to direct your job to a different partition.

 

Options to the sbatch command

For more information about these options and other useful sbatch options see the sbatch man page

Options to the sbatch command
OptionDescriptionDefault
-p partition Partition requested RM
-t HH:MM:SS Walltime requested in HH:MM:SS 30 minutes
-N n Number of nodes requested. 1
-A groupname

Group to charge the job to. If not specified, your default group is charged.  Find your default group

Note: Files created during a job will be owned by the group in effect when the job is submitted. This may be different than the group the job is charged to. See the discussion of the newgrp command in the Account Administration section of this User Guide to see how to change the group currently in effect.

Your default group

--res reservation-name
Note the "--" for this option

Use the reservation that has been set up for you.  Use of --res does not automatically set any other options. You still need to specify the other options (partition, walltime, number of nodes) that you would in any sbatch command.  If your reservation is not assigned to your default account then you will need to use the -A option to sbatch to specify the account. NA

--mem=nGB
Note the "--" for this option

Memory in GB.  This option is only valid for the LM partition. None
-C constraints

Specifies constraints which the nodes allocated to this job must satisfy.

An sbatch command can have only one -C option. Multiple constraints can be specified with "&". For example, -C LM&PH2 constrains the nodes to 3TB nodes with 20 cores and 38.5GB/core.   If mutilple -C commands are given, (e.g., sbatch ..... -C LM -C EGRESS) only the last applies.  The -C LM option will be ignored in this example.

Some valid constraints are:

EGRESS
Allows your compute nodes to communicate with sites external to Bridges
LM
Ensures that a job in the LM partition uses only the 3TB nodes. This option is required for any jobs in the LM partition which use /pylon5.
PH1
Ensures that the job will run on LM nodes which have 16 cores and 48GB/core
PH2
Ensures that the job will run on LM nodes which have 20 cores and 38.5GB/core
PERF
Turns on performance profiling. For use with performance profiling software like VTune, TAU

 See the discussion of the -C option in the sbatch man page for more information.

None

--gres=gpu:type:n
Note the "--" for this option

Specifies the type and number of GPUs requested. 'type' is either p100 or k80. The default is k80.

'n' is the number of requested GPUs. Valid choices are 1-4, when type is k80  and 1-2 when type is p100.

None

--ntasks-per-node=n
Note the "--" for this option

Request n cores be allocated per node.  1

--mail-type=type
Note the "--" for this option

Send email when job events occur, where type can be BEGIN, END, FAIL or ALL.   None

--mail-user=user
Note the "--" for this option

User to send email to as specified by -mail-type. Default is the user who submits the job.  None
-d=dependency-list Set up dependencies between jobs, where dependency-list can be:
after:job_id[:jobid...]
This job can begin execution after the specified jobs have begun execution.
afterany:job_id[:jobid...]
This job can begin execution after the specified jobs have terminated.
aftercorr:job_id[:jobid...]
A task of this job array can begin execution after the corresponding task ID in the specified job has completed successfully (ran to completion with an exit code of zero).
afternotok:job_id[:jobid...]
This job can begin execution after the specified jobs have terminated in some failed state (non-zero exit code, node failure, timed out, etc).
afterok:job_id[:jobid...]
This job can begin execution after the specified jobs have successfully executed (ran to completion with an exit code of zero).
singleton
This job can begin execution after any previously launched jobs sharing the same job name and user have terminated.
None
--no-requeue
Note the "--" for this option
Specifies that your job will be not be requeued under any circumstances. If your job is running on a node that fails it will not be restarted. Note the "--" for this option. NA

--time-min=HH:MM:SS
Note the "--" for this option.

Specifies a minimum walltime for your job in HH:MM:SS format.

SLURM considers the walltime requested when deciding which job to start next. Free slots on the machine are defined by the number of nodes and how long those nodes are free until they will be needed by another job. By specifying a minimum walltime you allow the scheduler to reduce your walltime request to your specified minimum time when deciding whether to schedule your job. This could allow your job to start sooner.

If you use this option your actual walltime assignment can vary between your minimum time and the time you specified with the -t option. If your job hits its actual walltime limit, it will be killed. When you use this option you should checkpoint your job frequently to save the results obtained to that point.

None

--switches=1
--switches=1@HH:MM:SS
Note the "--" for this option

Requests that the nodes your job runs on all be on one switch, which is a hardware grouping of 42 nodes.

If you are asking for more than 1 and fewer than 42 nodes, your job will run more efficiently if it runs on one switch.  Normally switches are shared across jobs, so using the switches option means your job may wait longer in the queue before it starts.

The optional time parameter gives a maximum time that your job will wait for a switch to be available. If it has waited this maximum time, the request for your job to be run on a switch will be cancelled.

NA
-h Help, lists all the available command options  

 

See also

 

Managing charging with multiple grants

If you have more than one grant, be sure to use the correct one when running jobs. 

See "Managing multiple grants" in the Account adminstration section of the Bridges User Guide to see how to find your groups and determine or change which is your default.

See the -A option to the sbatch or interact commands to set the grant that is charged for a job.

Note that any files created by a job are owned by the grant in effect when the job is submitted, which is not necessarily the grant charged for the job.  See the newgrp command in the Account Administration section of the Bridges User Guide to see how to change the group currently in effect.

 

Bridges partitions

Each SLURM partition manages a subset of Bridges' resources.  Each partition allocates resources to interactive sessions, batch jobs, and OnDemand sessions that request resources from it.  

Know which partitions are open to you: Your Bridges allocations determine which partitions you can submit jobs to.

  • A "Bridges regular memory" allocation allows you to use Bridges' RSM (128GB) nodes. Partitions available to "Bridges regular memory" allocations are
    • RM, for jobs that will run on Bridges' RSM (128GB) nodes, and use one or more full nodes
    • RM-shared, for jobs that will run on Bridges' RSM (128GB) nodes, but share a node with other jobs
    • RM-small, for short jobs needing 2 full nodes or less, that will run on Bridges RSM (128GB) nodes
  • A "Bridges GPU" allocation allows you to use Bridges' GPU nodes. Partitions available to "Bridges GPU" allocations are:
    • GPU, for jobs that will run on Bridges' GPU nodes, and use one or more full nodes
    • GPU-shared, for jobs that will run on Bridges' GPU nodes, but share a node with other jobs
    • GPU-small, for jobs that will use only one of Bridges' GPU nodes and 8 hours or less of wall time.
  • A "Bridges large memory" allocation allows you to use  Bridges LSM and ESM (3TB and 12TB) nodes. There is one partition available to "Bridges large memory" allocations:
    • LM, for jobs that will run on Bridges' LSM and ESM (3TB and 12TB) nodes

All the partitions use FIFO scheduling. If the top job in the partition will not fit, SLURM will try to schedule the next job in the partition. The scheduler follows policies to ensure that one user does not dominate the machine. There are also limits to the number of nodes and cores a user can simultaneously use. Scheduling policies are always under review to ensure best turnaround for users.

Partitions for "Bridges regular memory" allocations

There a three partitions available for "Bridges regular memory allocations": RM, RM-shared and RM-small.

Use your allocation wisely:  To make the most of your allocation, use the shared partitions whenever possible.  Jobs in the RM partition are charged for the use of all cores on a node.  Jobs in the RM-shared partition share nodes, and are only charged for the cores they are allocated. The RM partition is the default for the sbatch command, while RM-small is the default for the interact command. See the discussion of the interact and sbatch commands in this document for more information.

Charge your jobs to the correct group: If you have more than one Bridges grant, be sure to charge your usage to the correct one.  See "Managing charging for multiple grants".

For information on requesting resources and submitting  jobs see the discussion of the interact or sbatch commands.

 

RM partition

Jobs in the RM partition run on Bridges' RSM (128GB) nodes.  Jobs do not share nodes, and are allocated all 28 of the cores on each of the nodes assigned to them.  A job in the RM partition is charged for all 28 cores per node on its assigned nodes. 

RM jobs can use more than one node. However, the memory space of  all the nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.

The internode communication performance for jobs in the RM partition is best when using 42 or fewer nodes. 

When submitting a job to the RM partition, you should specify:

  • the number of  nodes
  • the walltime limit 

Sample interact command for the RM partition

An example of an interact command for the RM partition, requesting the use of 2 nodes for 30 minutes is

interact -p RM -N 2 -t 30:00

where:

-p indicates the intended partition

-N is the number of nodes requested

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for RM partition

An example of a sbatch command to submit a job to the RM partition, requesting one node for 5 hours is

sbatch -p RM -t 5:00:00 -N 1 myscript.job

where:

-p indicates the intended partition

-t is the walltime requested in the format HH:MM:SS

-N is the number of nodes requested

myscript.job is the name of your batch script

 

RM-shared partition

Jobs in the RM-shared partition run on (part of) a Bridges' RSM (128GB) node.  Jobs will share a node with other jobs, but will not share cores.   A job in the RM-shared partition will be charged only for the cores allocated to it, so it will use fewer SUs than a RM job.  It could also start running sooner.

RM-shared jobs are assigned memory in proportion to the number of requested cores.   They get the fraction of the node's total memory in proportion to the number of cores requested. If the job exceeds this amount of memory it will be killed.

When submitting a job to the RM-shared partition, you should specify:

  • the number of cores
  • the walltime limit

Sample interact command for the RM-shared partition

Run in the RM-shared partition using 4 cores and 1 hour of walltime. 

interact -p RM-shared --ntasks-per-node=4 -t 1:00:00

where:

-p indicates the intended partition

--ntasks-per-node requests the use of 4 cores

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for the RM-shared partition

Submit a job to RM-shared asking for 2 cores and 5 hours of walltime.

sbatch -p RM-shared --ntasks-per-node 2 -t 5:00:00 myscript.job

where:

-p indicates the intended partition

--ntasks-per-node requests the use of 2 cores

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Sample batch script for RM-shared partition

#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH -t 5:00:00
#SBATCH --ntasks-per-node 2

#echo commands to stdout
set -x

# move to working directory
# this job assumes:
# - all input data is stored in this directory 
# - all output should be stored in this directory

cd /pylon5/groupname/username/path-to-directory

#run OpenMP program
export OMP_NUM_THREADS 2
./myopenmp

Notes: For groupname, username, and path-to-directory substitute your group, username, and directory path.

RM-small

Jobs in the RM-small partition run on Bridges' RSM (128GB) nodes, but are limited to at most 2 full nodes and 8 hours.  Jobs can share nodes.  Note that the memory space of  all the nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.When submitting a job to the RM-small partition, you should specify:

  • the number of nodes
  • the number of cores
  • the walltime limit

Sample interact command for the RM-small partition

Run in the RM-small partition using one node,  8 cores and 45 minutes of walltime. 

interact -p RM-small -N 1 --ntasks-per-node=8 -t 45:00

where:

-p indicates the intended partition

-N requests one node

--ntasks-per-node requests the use of 8 cores

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for the RM-small partition

Submit a job to RM-small asking for 2 nodes and 6 hours of walltime.

sbatch -p RM-small -N 2 -t 6:00:00 myscript.job

where:

-p indicates the intended partition

-N requests the use of 2 nodes

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Summary of partitions for Bridges regular memory nodes

Partition nameRMRM-sharedRM-small
Node type 128GB
28 cores
8TB on-node storage
128GB
28 cores
8TB on-node storage
128GB
28 cores
8TB on-node storage
Nodes shared? No Yes Yes
Node default 1 1 1
Node max 168
If you need more than 168, contact bridges@psc.edu to make special arrangements.
1 2
Core default 28/node 1 1
Core max 28/node 28 28/node
Walltime default 30 mins 30 mins 30 mins
Walltime max 48 hrs 48 hrs 8 hrs
Memory 128GB/node 4.5GB/core 4.5GB/core

 

See also

Partitions for "Bridges GPU" allocations

There are three partitions available for "Bridges GPU" allocations: GPU, GPU-shared and GPU-small.

Use your allocation wisely:  To make the most of your allocation, use the shared partitions whenever possible.  Jobs in the GPU partition are charged for the use of all cores on a node. Jobs in the GPU-shared partition share nodes, and are only charged for the cores they are allocated.

Charge your jobs to the correct group: If you have more than one Bridges grant, be sure to charge your usage to the correct one.  See "Managing charging for multiple grants".

For information on requesting resources and submitting  jobs see the interact or sbatch commands.

 

GPU partition

Jobs in the GPU partition use Bridges' GPU nodes.  Note that Bridges has 2 types of GPU nodes: K80s and P100s.  See the System Configuration section of this User Guide for the details of each type.

Jobs in the GPU partition do not share nodes, so jobs are allocated all the cores and all of the GPUs associated with the nodes assigned to them . Your job will be charged for all the cores associated with your assigned nodes.

However, the memory space across nodes is not integrated. The cores within a node access a shared memory space, but cores in different nodes do not.

When submitting a job to the GPU partition, you must specify the type and number of GPUs with the --gres option.

You should also specify:

  • the type of node you want, K80 or P100, with the --gres=type option to the interact or sbatch commands.  K80 is the default if no type is specified.  See the sbatch command options for more details.
  • the number of nodes
  • the walltime limit 

Sample interact command for GPU

An interact command to start a GPU job on 4 P100 nodes for 30 minutes is

 interact -p GPU --gres=gpu:p100:2 -N 4 -t 30:00

where:

-p indicates the intended partition

--gres=gpu:p100:2  requests the use of 2 P100 GPUs

-N requests 4 nodes

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for GPU

This command requests the use of one K80 GPU node for 45 minutes;

sbatch -p GPU --gres=gpu:k80:4 -N 1 -t 45:00 myscript.job

where:

-p indicates the intended partition

--gres=gpu:k80:4  requests the use of 4 K80 GPUs

-N requests one node

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Sample batch script for GPU partition

#!/bin/bash
#SBATCH -N 2
#SBATCH -p GPU
#SBATCH --ntasks-per-node 28
#SBATCH -t 5:00:00
#SBATCH --gres=gpu:p100:2

#echo commands to stdout
set -x

#move to working directory
# this job assumes:
# - all input data is stored in this directory
# - all output should be stored in this directory cd /pylon5/groupname/username/path-to-directory

#run GPU program ./mygpu

Notes:The value of the --gres-gpu option indicates the type and number of GPUs you want.For groupname, username and path-to-directory you must substitute your group, username and appropriate directory path.

 

GPU-shared partition

Jobs in the GPU-shared partition run on Bridges's GPU nodes.  Note that Bridges has 2 types of GPU nodes: K80s and P100s.  See the System Configuration section of this User Guide for the details of each type.

Jobs in the GPU-shared partition share nodes, but not cores. By sharing nodes your job will be charged less.  It could also start running sooner.

You will always run on (part of) one node in the GPU-shared partition.Your jobs will be allocated memory in proportion to the number of requested GPUs. You get the fraction of the node's total memory in proportion to the fraction of GPUs you requested. If your job exceeds this amount of memory it will be killed.

When submitting a job to the GPU-shared partition, you must specify the number of GPUs.  You should also specify:

  • the type of GPU node you want, K80or P100, with the --gres=type option to the interact or sbatch commands.  K80 is the default if no type is specified.  See the sbatch command options for more details.
  • the walltime limit

Sample interact command for GPU-shared

Run in the GPU-shared partition and ask for 4 K80 GPUs and 8 hours of wall time.

interact -p GPU-shared --gres=gpu:k80:4 -t 8:00:00

where:

-p indicates the intended partition

--gres=gpu:k80:4  requests the use of 4 K80 GPUs

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for GPU-shared

Submit a job to the GPU-shared partition requesting 2 P100 GPUs and 1 hour of wall time.

sbatch -p GPU-shared --gres=gpu:P100:2 -t 1:00:00 myscript.job

where:

-p indicates the intended partition

--gres=gpu:P100:2  requests the use of 2 P100 GPUs

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Sample batch script for GPU-shared partition

#!/bin/bash
#SBATCH -N 1
#SBATCH -p GPU-shared
#SBATCH --ntasks-per-node 7 #SBATCH --gres=gpu:p100:1 #SBATCH -t 5:00:00
#echo commands to stdout set -x
#move to working directory # this job assumes:
# - all input data is stored in this directory
# - all output should be stored in this directory
cd /pylon5/groupname/username/path-to-directory

#run GPU program ./mygpu

Notes:The option --gres-gpu indicates the number and type of GPUs you want.For groupname, username and path-to-directory you must substitute your group, username, and appropriate directory path.

 

GPU-small

Jobs in the GPU-small partition run on one of Bridges' P100 GPU nodes.  Your jobs will be allocated memory in proportion to the number of requested GPUs. You get the fraction of the node's total memory in proportion to the fraction of GPUs you requested. If your job exceeds this amount of memory it will be killed.

When submitting a job to the GPU-small partition, you must specify the number of GPUs with the --gres=gpu:p100:n  option to the interact or sbatch command.  In this partition, n can be 1 or 2.  You should also specify the walltime limit.

 

Sample interact command for GPU-small

Run in the GPU-small partition and ask for 2 P100 GPUs and 2  hours of wall time.

interact -p GPU-small --gres=gpu:p100:2 -t 2:00:00

where:

-p indicates the intended partition

--gres=gpu:P100:2  requests the use of 2 P100 GPUs

-t is the walltime requested in the format HH:MM:SS

Sample sbatch command for GPU-small

Submit a job to the GPU-small partition using 2 P100 GPUs and 1 hour of wall time.

sbatch -p GPU-small --gres=gpu:p100:2 -t 1:00:00 myscript.job

where:

-p indicates the intended partition

--gres=gpu:P100:2  requests the use of 2 P100 GPUs

-t is the walltime requested in the format HH:MM:SS

myscript.job is the name of your batch script

Summary of partitions for Bridges GPU nodes

Partition nameGPUGPU-sharedGPU-small
 P100 nodesK80 nodesP100 nodesK80 nodesP100 nodes
Node type 2 GPUs
2 16-core CPUs
8TB on-node storage
4 GPUs
2 14-core CPUS
8TB on-node storage
2 GPUs
2 16-core CPUs
8TB on-node storage
4 GPUs
2 14-core CPUs
8TB on-node storage
2 GPUs
2 16-core CPUs
8TB on-node storage
Nodes shared? No No Yes Yes No
Node default 1 1 1 1 1
Node max 8
Limit of 16 GPUs/job.
Because there are 2 GPUs on each P100 node, you can request at most 8 nodes.
4
Limit of 16 GPUs/job.
Because there are 4 GPUs on each K80 node, you can request at most 4 nodes.
1 1 1
Core default 32/node 28/node 16/GPU 7/GPU 32/node
Core max 32/node 28/node 16/GPU 7/GPU 32/node
GPU default 2/node 4/node No default No default No default
GPU max 2/node 4/node 2 4 2
Walltime default 30 mins 30 mins 30 mins 30 mins 30 mins
Walltime max 48 hrs 48 hrs 48 hrs 48 hrs 8 hrs
Memory 128GB/node 128GB/node 7GB/GPU 7GB/GPU 128GB/node

 

See also

Partitions for "Bridges large memory" allocations

There is one partition available for "Bridges large memory: allocations: LM.

Charge your jobs to the correct group: If you have more than one Bridges grant, be sure to charge your usage to the correct one.  See "Managing charging for multiple grants".

For information on requesting resources and submitting  jobs see the interact or sbatch commands.

 

LM partition

Jobs submitted to the LM partition must request the amount of memory they need. There is no default memory value.  Each core on the 3TB and 12TB nodes is associated with a fixed amount of memory, so the amount of memory you request determines the number of cores assigned to your job. You cannot specifically request a number of cores in the LM partition.

SLURM will place jobs on either a 3TB or a 12TB node based on the memory request.  Jobs asking for 3000GB or less will run on a 3TB node.  If no 3TB nodes are available but a 12TB node is available, those jobs will run on a 12TB node.

Jobs in the LM partition share nodes. Your memory space for an LM job is an integrated, shared memory space.

Once your job is running, the environment variable SLURM_NTASKS tells you the number of cores assigned to your job.

When submitting a job to the LM partition, you must specify

  • the amount of memory in GB  - any value up to 12000GB can be requested
  • the walltime limit  

Sample interact command for LM

Run in the LM partition and request 2TB of memory. Use the wall time default of 30 minutes.

interact -p LM --mem=2000GB

where:

-p indicates the intended partition (LM)

--mem is the amount of memory requested

Sample sbatch command for the LM partition

A sample sbatch command for the LM partition requesting 10 hours of wall time and 6TB of memory is: 

sbatch -p LM - t 10:00:00 --mem 6000GB myscript.job

where:

-p indicates the intended partition (LM)

-t is the walltime requested in the format HH:MM:SS

--mem is the amount of memory requested

myscript.job is the name of your batch script

 

Summary of partitions for Bridges large memory nodes

Partition nameLM
 LSM nodesESM nodes
Node type 3TB RAM
16TB on-node storage
12TB RAM
64TB on-node storage
Nodes shared? Yes Yes
Node default 1 1
Node max 8 4
Cores Jobs are allocated 1 core/48GB of memory requested. Jobs are allocated 1 core/48GB of memory requested.
Walltime default 30 mins 30 mins
Walltime max 14 days 14 days
Memory Up to 3000GB Up to 12,000GB

 

See also

 

Node, partition, and job status information

sinfo

The sinfo command displays information about the state of Bridges's nodes. The nodes can have several states:

alloc Allocated to a job
down Down
drain Not available for scheduling
idle Free
resv Reserved

See also

squeue

The squeue command displays information about the jobs in the partitions. Some useful options are:

-j jobid Displays the information for the specified jobid
-u username Restricts information to jobs belonging to the specified username
-p partition Restricts information to the specified partition
-l (long) Displays information including:  time requested, time used, number of requested nodes, the nodes on which a job is running, job state and the reason why a job is waiting to run.

See also

  • squeue man page for a discussion of the codes for job state, for why a job is waiting to run, and more options.

 

scancel

The scancel command is used to kill a job in a partition, whether it is running or still waiting to run.  Specify the jobid for the job you want to kill.  For example,

scancel 12345

kills job # 12345.

See also

sacct

The sacct command can be used to display detailed information about jobs. It is especially useful in investigating why one of your jobs failed. The general format of the command is

    sacct -X -j nnnnnn -S MMDDYY --format parameter1,parameter2, ...

  • For 'nnnnnn' substitute the jobid of the job you are investigating.
  • The date given for the -S option is the date at which sacct begins searching for information about your job. 
  • The commas between the parameters in the --format option cannot be followed by spaces.

The --format option determines what information to display about a job. Useful parameters are

  • JobID
  • Partition
  • Account - the account charged
  • ExitCode - useful in determining why a job failed
  • State - useful in determining why a job failed
  • Start, End, Elapsed - start, end and elapsed time of the job
  • NodeList - list of nodes used in the job
  • NNodes - how many nodes the job was allocated
  • MaxRSS - how much memory the job used
  • AllocCPUs - how many cores the job was allocated

 See also

Monitoring memory usage

It can be useful to find the memory usage of your jobs. For example, you may want to find out if memory usage was a reason a job failed.

You can determine a job's memory usage whether it is still running or has finished. To determine if your job is still running, use the squeue command.

squeue -j nnnnnn -O state

where nnnnnn is the jobid.

For running jobs: srun and top or sstat

You can use the srun and top commands to determine the amount of memory being used.

srun --jobid=nnnnnn top -b -n 1 | grep userid

For nnnnnn substitute the jobid of your job. For 'userid' substitute your userid. The RES field in the output from top shows the actual amount of memory used by a process. The top man page can be used to identify the fields in the output of the top command.

  • See the man pages for srun and top for more information.

You can also use the sstat command to determine the amount of memory being used in a running job 

sstat -j nnnnnn.batch --format=JobID,MaxRss

where nnnnnn is your jobid.

For jobs that are finished: sacct or job_info

If you are checking within a day or two after your job has finished you can issue the command

sacct -j nnnnnn --format=JobID,MaxRss

If this command no longer shows a value for MaxRss, use the job_info command

job_info nnnnnn | grep max_rss

Substitute your jobid for nnnnnn in both of these commands.

  

See also