Bridges User Guide
Bridges User Guide: GPU Use
Two GPU resources are part of Bridges: “Bridges AI” and “Bridges GPU”. When you receive a Bridges allocation, the resource you are allocated determines which set of GPU nodes you have access to.
Bridges’ GPU nodes have either NVIDIA Tesla K80, P100 or V100 GPUs, providing substantial, complementary computational power for deep learning, simulations and other applications.
A standard NVIDIA accelerator environment is installed on Bridges’ GPU nodes. If you have programmed using GPUs before, you should find this familiar. Please contact firstname.lastname@example.org for more help.
The Bridges AI resource consists of ten nodes.
An NVIDIA DGX-2 enterprise research AI system which tightly couples 16 NVIDIA Tesla V100 (Volta) GPUs with 32GB of GPU memory each (512GB/node). The DGX-2 also holds twoIntel Xeon Platinum 8168 CPUs with
24 cores/CPU (48 cores total) and 1.5TB RAM. The GPUs are connected by NVLink and NVSwitch, to provide maximum capability for the most demanding of AI challenges.
9 NVIDIA Tesla V100 GPU nodes, each with 8 GPUs with 16GB of GPU memory each (128GB/node), on HPE Apollo 6500 servers with 2 Intel Xeon Gold 6148
20 cores/CPU (40 cores total) and 192GB RAM. The GPUs are connected by NVLink 2.0, to balance great AI capability and capacity.
The Bridges GPU resource consists of 48 nodes.
16 NVIDIA Tesla K80 GPU nodes on HPE Apollo 2000 servers. Each holds two NVIDIA K80 GPU cards with 24GB of GPU memory (48GB/node), and each card contains two GPUs that can be individually scheduled. Each node also has 2 Intel Xeon E5-2695 v3 CPUs (14 cores per CPU) and 128GB RAM.
Ideally, the GPUs are shared in a single application to ensure that the expected amount of on-board memory is available and that the GPUs are used to their maximum capacity. This makes the K80 GPU nodes optimal for applications that scale effectively to 2, 4 or more GPUs. Some examples are GROMACS, NAMD and VASP. Applications using a multiple of 4 K80 GPUs will maximize system throughput.
32 NVIDIA Tesla P100 GPU nodes on HPE Apollo 2000 servers. Each node contains 2 NVIDIA P100 GPU cards, and each card holds one very powerful GPU and 16GB of GPU memory (32GB/node). Each nodes also has 2 Intel Xeon E5-2683 v4 CPUs (16 cores per CPU) and 128GB RAM.
These nodes are optimally suited for single-GPU applications that require maximum acceleration. The most prominent example of this is deep learning training using frameworks that do not use multiple GPUs.
See the System configuration section of the Bridges User Guide for hardware details for all GPU node types.
The /home and pylon5 file systems are available on all of these nodes. See the File Spaces section of the User Guide for more information on these file systems.
Compiling and Running jobs
Use the GPU partition, either in batch or interactively, to compile your code and run your jobs. See the Running Jobs section of the User Guide for more information on Bridges’ partitions and how to run jobs.
More information on using CUDA on Bridges can be found in the CUDA document.
To use CUDA, first you must load the CUDA module. To see all versions of CUDA that are available, type:
module avail cuda
Then choose the version that you need and load the module for it.
module load cuda
loads the default CUDA. To load a different version, use the full module name.
module load cuda/8.0
CUDA 8 codes should run on both types of Bridges’ GPU nodes with no issues. CUDA 7 should only be used on the K80 GPUs (Phase 1). Performance may suffer with CUDA 7 on the P100 nodes (Phase 2).
Our primary GPU programming environment is OpenACC.
The PGI compilers are available on all GPU nodes. To set up the appropriate environment for the PGI compilers, use the
module load pgi
If you will be using these compilers often, it will be useful to add this command to your shell initialization script.
There are many options available with these compilers. See the online man pages (“man pgf90”,”man pgcc”,”man pgCC”) for detailed information. You may find these basic OpenACC options a good place to start:
pgcc –acc yourcode.c pgf90 –acc yourcode.f90
P100 node users should add the “-ta=tesla,cuda8.0” option to the compile command, for example:
pgcc -acc -ta=tesla,cuda8.0 yourcode.c
Adding the “-Minfo=accel” flag to the compile command (whether pgf90, pgcc or pgCC) will provide useful feedback regarding compiler errors or success with your OpenACC commands.
pgf90 -acc -Minfo=accel yourcode.f90
Hybrid MPI/GPU Jobs
To run a hybrid MPI/GPU job use the following commands for compiling your program:
module load cuda module load mpi/pgi_openmpi mpicc -acc yourcode.c
When you execute your program you must first issue the above two module load commands.
Profiling and Debugging
For CUDA codes, use the command line profiler
nvprof. See the CUDA document for more information.
For OpenACC codes, the environment variables PGI_ACC_TIME, PGI_ACC_NOTIFY and PGI_ACC_DEBUG can provide profiling and debugging information for your job. Specific commands depend on the shell you are using.
|Bash shell||C shell|
|Enable runtime GPU performance profiling||export PGI_ACC_TIME=1||setenv PGI_ACC_TIME 1|
For data transfer information, set PGI_ACC_NOTIFY to 3.
|export PGI_ACC_NOTIFY=1||setenv PGI_ACC_NOTIFY 1|
|More detailed debugging||export PGI_ACC_DEBUG=1||setenv PGI_ACC_DEBUG 1|