Using Bridges' GPU nodes
The NVIDIA Tesla K80 and P100 GPUs on Bridges provide substantial, complementary computational power for deep learning, simulations and other applications.
A standard NVIDIA accelerator environment is installed on Bridges' GPU nodes. If you have programmed using GPUs before, you should find this familiar. Please contact firstname.lastname@example.org for more help.
There are two types of GPU nodes on Bridges: 16 nodes with NVIDIA K80 GPUs and 32 nodes with NVIDIA P100 GPUs.
K80 nodes: The 16 K80 GPU nodes each contain 2 NVIDIA K80 GPU cards, and each card contains two GPUs that can be individually scheduled. Ideally, the GPUs are shared in a single application to ensure that the expected amount of on-board memory is available and that the GPUs are used to their maximum capacity. This makes the K80 GPU nodes optimal for applications that scale effectively to 2, 4 or more GPUs. Some examples are GROMACS, NAMD and VASP. Applications using a multiple of 4 K80 GPUs will maximize system throughput.
The nodes are HPE Apollo 2000s, each with 2 Intel Xeon E5-2695 v3 CPUs (14 cores per CPU) and 128GB RAM.
P100 nodes: The 32 P100 GPU nodes contain 2 NVIDIA P100 GPU cards, and each card holds one very powerful GPU, optimally suited for single-GPU applications that require maximum acceleration. The most prominent example of this is deep learning training using frameworks that do not use multiple GPUs.
The nodes are HPE Apollo 2000s, each with 2 Intel Xeon E5-2683 v4 CPUs (16 cores per CPU) and 128GB RAM.
The /home, pylon2 and pylon5 file systems are available on all of these nodes. See the File Spaces section of the User Guide for more information on these file systems.
Compiling and Running jobs
Use the GPU partition, either in batch or interactively, to compile your code and run your jobs. See the Running Jobs section of the User Guide for more information on Bridges' partitions and how to run jobs.
To use CUDA, first you must load the CUDA module. To see all versions of CUDA that are available, type:
module avail cuda
Then choose the version that you need and load the module for it.
module load cuda
loads the default CUDA. To load a different version, use the full module name.
module load cuda/8.0
CUDA 8 codes should run on both types of Bridges' GPU nodes with no issues. CUDA 7 should only be used on the K80 GPUs (Phase 1). Performance may suffer with CUDA 7 on the P100 nodes (Phase 2).
Our primary GPU programming environment is OpenACC.
The PGI compilers are available on all GPU nodes. To set up the appropriate environment for the PGI compilers, use the
module load pgi
If you will be using these compilers often, it will be useful to add this command to your shell initialization script.
There are many options available with these compilers. See the online man pages (“man pgf90”,”man pgcc”,”man pgCC”) for detailed information. You may find these basic OpenACC options a good place to start:
pgcc –acc yourcode.c pgf90 –acc yourcode.f90
P100 node users should add the “-ta=tesla,cuda8.0” option to the compile command, for example:
pgcc -acc -ta=tesla,cuda8.0 yourcode.c
Adding the “-Minfo=accel” flag to the compile command (whether pgf90, pgcc or pgCC) will provide useful feedback regarding compiler errors or success with your OpenACC commands.
pgf90 -acc -Minfo=accel yourcode.f90
Hybrid MPI/GPU Jobs
To run a hybrid MPI/GPU job use the following commands for compiling your program:
module load cuda module load mpi/pgi_openmpi mpicc -acc yourcode.c
When you execute your program you must first issue the above two module load commands.
Profiling and Debugging
The environment variables PGI_ACC_TIME, PGI_ACC_NOTIFY and PGI_ACC_DEBUG can provide profiling and debugging information for your job. Specific commands depend on the shell you are using.
Send email to email@example.com to request that additional CUDA-oriented debugging tools be installed.