Using Bridges' GPU nodes

The NVIDIA Tesla K80 and P100 GPUs on Bridges provide substantial, complementary computational power for deep learning, simulations and other applications.

A standard NVIDIA accelerator environment is installed on  Bridges' GPU nodes.  If you have programmed using GPUs before, you should find this familiar.   Please contact for more help.

GPU Nodes

There are two types of GPU nodes on Bridges: 16 nodes with NVIDIA K80 GPUs and 32 nodes with NVIDIA P100 GPUs.  

K80 nodes:  The 16 K80 GPU nodes each contain 2 NVIDIA K80 GPU cards, and each card contains two GPUs that can be individually scheduled. Ideally, the GPUs are shared in a single application to ensure that the expected amount of on-board memory is available and that the GPUs are used to their maximum capacity. This makes the K80 GPU nodes optimal for applications that scale effectively to 2, 4 or more GPUs.  Some examples are GROMACS, NAMD and VASP.  Applications using a multiple of 4 K80 GPUs will maximize system throughput.

The nodes are HPE Apollo 2000s, each with 2 Intel Xeon E5-2695 v3 CPUs (14 cores per CPU) and 128GB RAM.

P100 nodes: The 32 P100 GPU nodes contain 2 NVIDIA P100 GPU cards, and each card holds one very powerful GPU, optimally suited for single-GPU applications that require maximum acceleration.  The most prominent example of this is deep learning training using frameworks that do not use multiple GPUs.

The nodes are HPE Apollo 2000s, each with 2 Intel Xeon E5-2683 v4 CPUs (16 cores per CPU) and 128GB RAM.

File Systems

The /home and pylon5 file systems are available on all of these nodes.  See the File Spaces section of the User Guide for more information on these file systems.

Compiling and Running jobs

Use the GPU partition, either in batch or interactively, to compile your code and run your jobs.  See the Running Jobs section of the User Guide for more information on Bridges' partitions and how to run jobs. 



To use CUDA, first you must load the CUDA module.  To see all versions of CUDA that are available, type:

module avail cuda

Then choose the version that you need and load the module for it.

module load cuda

loads the default CUDA.   To load a different version, use the full module name.

module load cuda/8.0

 CUDA 8 codes should run on both types of Bridges' GPU nodes with no issues.  CUDA 7 should only be used on the  K80 GPUs (Phase 1).  Performance may suffer with CUDA 7 on the P100 nodes (Phase 2).



Our primary GPU programming environment is OpenACC.

The PGI compilers are available on all GPU nodes. To set up the appropriate environment for the PGI compilers, use the  module  command:

module load pgi

Read more about the module command at PSC.  

If you will be using these compilers often, it will be useful to add this command to your shell initialization script.

There are many options available with these compilers. See the online man pages (“man pgf90”,”man pgcc”,”man pgCC”) for detailed information.  You may find these basic OpenACC options a good place to start:

pgcc –acc yourcode.c
pgf90 –acc yourcode.f90


P100 node users  should add the “-ta=tesla,cuda8.0” option to the compile command, for example:

pgcc -acc -ta=tesla,cuda8.0 yourcode.c

Adding the “-Minfo=accel” flag to the compile command (whether pgf90, pgcc or pgCC) will provide useful feedback regarding compiler errors or success with your OpenACC commands.

pgf90 -acc -Minfo=accel yourcode.f90

Hybrid MPI/GPU Jobs

To run a hybrid MPI/GPU job use the following commands for compiling your program:

module load cuda
module load mpi/pgi_openmpi
mpicc -acc yourcode.c

When you execute your program you must first issue the above two module load commands.

Profiling and Debugging

The environment variables PGI_ACC_TIME, PGI_ACC_NOTIFY and PGI_ACC_DEBUG can provide profiling and debugging information for your job. Specific commands depend on the shell you are using.
 Unix shells

Send email to to request that additional CUDA-oriented debugging tools be installed.


 Bash shellC shell
Performance profiling
Enable runtime GPU performance profiling export PGI_ACC_TIME=1 setenv PGI_ACC_TIME 1
Basic debugging.
For data transfer information, set PGI_ACC_NOTIFY to 3.
More detailed debugging  export PGI_ACC_DEBUG=1 setenv PGI_ACC_DEBUG 1