Bridges-AI User Guide

This User Guide is intended for users with a “Bridges-AI” allocation. 

This document is not a complete guide to Bridges and does not include all of the information in the Bridges User Guide.  If you are not familiar with Bridges, please refer to the Bridges User Guide. In particular, if you are not comfortable with the interact and sbatch commands and their options, refer to the Running Jobs section in the Bridges User Guide for important information.

Table of Contents


Bridges-AI is ideally suited for deep learning, other kinds of machine learning, graph analytics, and data science. AI and machine learning frameworks for Bridges-AI are provided through NVIDIA-optimized Singularity containers.

Technical support can be reached at:

Hardware description

Bridges-AI introduces 88 NVIDIA Volta GPUs in the following new nodes:

  • An NVIDIA DGX-2 enterprise research AI system tightly couples 16 NVIDIA Tesla V100 (Volta) GPUs with 32 GB of GPU memory each, connected by NVLink and NVSwitch, to provide maximum capability for the most demanding of AI challenges
  • 9 HPE Apollo 6500 servers, each with 8 NVIDIA Tesla V100 GPUs with 16 GB of GPU memory each, connected by NVLink 2.0, to balance great AI capability and capacity


Using containers on Bridges-AI

To use the software on Bridges-AI, we strongly recommend that you use a Singularity container image. You can use Singularity images already available locally on Bridges, or download an image, either from the NVIDIA GPU Cloud (NGC) Registry or another registry, or create your own.  To get the best performance from the nodes with Volta GPUs, use software from an NGC container.

For security reasons, Bridges-AI supports Singularity as a container technology but not Docker.


NVIDIA GPU Cloud (NGC) Containers

NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing. NVIDIA optimizes the containers for Volta, including rigorous quality assurance.

The containers on the NGC Registry are Docker images, but we have converted many of them to Singularity for you to use on Bridges-AI. These containers may be run on Bridges-AI nodes or on Bridges’ NVIDIA Tesla P100 GPUs, but they are not compatible with Bridges’ Tesla K80 GPUs.

NVIDIA requests that you create an account at if you will use any of these containers.


Containers installed on Bridges

These containers are installed on Bridges as Singularity images for you to use. Multiple versions of each are available, which vary in the versions of the software in them.   For details on which containers are installed and the software each contains, see Singularity images on Bridges.

Package path on Bridges NVIDIA Documentation
Caffe /pylon5/containers/ngc/caffe
Caffe2 /pylon5/containers/ngc/caffe2
CNTK /pylon5/containers/ngc/cntk
DIGITS /pylon5/containers/ngc/digits
 Inference Server /pylon5/containers/ngc/inferenceserver
MXNet /pylon5/containers/ngc/mxnet
PyTorch /pylon5/containers/ngc/pytorch
TensorFlow /pylon5/containers/ngc/tensorflow
TensorRT /pylon5/containers/ngc/tensorrt
TensorRT Inference Server /pylon5/containers/ngc/tensorrtserver
Theano /pylon5/containers/ngc/theano
Torch /pylon5/containers/ngc/torch


Containers available on the NGC Registry

The table below lists the packages in NGC Registry which are not installed on Bridges. Multiple versions of each are available. Visit the NGC registry for more information.

If you want to use a NGC container which is not installed on Bridges, you can access it directly from the NGC Registry, but you must create an account on the NGC Registry to do this.

Package NVIDIA Documentation


Accessing Container Images


Containers already on Bridges

A subset of the container images provided by NGC is already available on Bridges under the directory /pylon5/containers/ngc/. For details on which containers are installed and the software each contains, see Singularity images on Bridges.

Each software package (tensorflow, caffe, etc.) has its own directory containing multiple images. These images correspond to different versions of software. For example, to see all of the tensorflow images available type:

ls /pylon5/containers/ngc/tensorflow/

In this case, the output shows that there are four images are available. Note that there are two containers built with python2 and two built with python3.

18.09-py2.simg  18.10-py2.simg  18.09-py3.simg  18.10-py3.simg

These are all Singularity images which are ready to use on Bridges. You can use these containers directly from the /pylon5/containers/ngc directory; there is no need to copy them to another directory.


Using your own containers

If you need a container that is not already available on Bridges, you may be able to find a suitable one on the NGC Registry. This requires creating your own account with NGC. Go to to create an NGC Registry account.

If you have a Singularity container of your own, you can download it to Bridges and use it on Bridges-AI. If you have a Docker container you wish to use, download it to Bridges and then convert it to Singularity before using it.

Converting Docker containers to Singularity 

Once you have downloaded a Docker image, you must convert it to Singularity before using it on Bridges.  You can do this in an interactive or batch session on one of Bridges-AI nodes.  See the Running jobs section of the Bridges User Guide for information on starting an interactive session or submitting a batch job.

Whether you are using an interactive session or a batch job, you must use the GPU-AI partition.  Once your interactive session has started, or inside your batch script, load the Singularity module and use the singularity build command to convert your Docker container.

Use these commands in an interactive session or in a batch script to convert a Docker container from the NGC to Singularity:

source /etc/profile.d/ module load singularity export SINGULARITY_DOCKER_USERNAME='$oauthtoken' export SINGULARITY_DOCKER_PASSWORD=your-key-string export SINGULARITY_CACHEDIR=$SCRATCH/.singularity singularity build $SCRATCH/new-container.simg docker://

where the username and password credentials are those from NVIDIA when you register at



Some important datasets have been installed on Bridges and are available to all Bridges’ users.  In addition, many others are available for you to download for your use on Bridges.



Public datasets installed on Bridges

Many useful datasets are available on Bridges for all users to access. You can also install a dataset for your own use, or request that a dataset be installed publicly for all users.  

See the Data collections section of the Bridges User Guide for a list of publicly available datasets and a form to request that another public dataset  be installed.


Other useful datasets

A list of datasets that may be useful follows. These datasets are not currently installed on Bridges, but can be copied to your pylon5 space, or if you think they would be useful to many Bridges’ users, you can request that they be installed in a public space.

Deep Learning

Keras Datasets for Import

These datasets are available from

  • CIFAR10 small image classification
  • CIFAR100 small image classification
  • IMDB Movie reviews sentiment classification
  • Reuters newswire topics classification
  • MNIST database of handwritten digits
  • Fashion-MNIST database of clothing
  • Boston housing price regression dataset (from CMU)

Image Databases

Natural Language Processing

Audio and Audio-Visual


Machine Learning

Scikit-Learn Datasets for Import

Multi-class classification and clustering


Binary Classification

Univariate Time Series

Multivariate Time Series

Running Jobs

Once you have a Singularity image available locally you are ready to start using it for your application. You can run jobs either interactively or as a batch job, as with any other job on Bridges. For more details on this please see the Running job section of the Bridges User Guide.  You must use the GPU-AI partition.


GPU-AI partition summary

Partition name GPU-AI
  Volta 16 DGX-2
Node type Tesla V100 (Volta) GPUs with 16 GB of GPU memory Tesla V100 GPUs with 32 GB of GPU memory
GPUs/node 8 16
Default # of nodes 1 1
Max # of nodes 4 1
Min GPUs per job 1 1
Max GPUs per job 32 16
Max GPUs in use per user 32 32
Walltime default 1 hour 1 hour
Walltime max 12 hours 12 hours




Running interactively in the GPU-AI partition

To run in an interactive session on Bridges-AI, use the interact command and specify the GPU-AI partition. An example interact command to request 1 GPU on an Apollo 6500 node is:

interact -p GPU-AI --gres=gpu:volta16:1


-p indicates the intended partition

–gres=gpu:volta16:1  requests the use of 1 V100 GPU

Once your interactive session has started, you can run the Singularity image. 


To start the Singularity image and then fire up a shell, type

singularity shell --nv singularity-container-name.simg


 –nv gives your container GPU support

singularity-container-name is the container you wish to use

Type any commands you like at the prompt.


Alternately, you can create a bash shell script and run it inside of the container. To do so, once your interactive session has started, type

singularity exec --nv singularity-container-name.simg


–nv gives your container GPU support

singularity-container-name is the container you wish to use is your bash script


Running a batch job in the GPU-AI partition


Using module files on Bridges-AI

The Module package provides for the dynamic modification of a users’s environment via module files. Module files manage necessary changes to the environment, such as adding to the default path or defining environment variables, so that you do not have to manage those definitions and paths manually. Before you can use module files in a batch job on Bridges-AI, you must issue the following command:

If you are using the bash or ksh:

source /etc/profile.d/      

If you are using csh or tcsh:

source /etc/profile.d/modules.csh

See the Module documentation for information on the module command.


The sbatch command

To run a batch job, you must create a batch script and submit it to the GPU-AI partition using the sbatch command.  Please see the Running jobs section of the Bridges User Guide for information on batch scripts, the sbatch command and its options, and more.


A sample sbatch command to submit a job to run on one of the Apollo servers and use all eight GPUs would be

sbatch -p GPU-AI -N 1 --gres=gpu:volta16:8 -t 1:00:00 myscript.job


-p GPU-AI requests the GPU-AI partition

-N 1 requests one node

–gres=gpu:volta16:8 requests an Apollo server with Volta 100 GPUs , and specifies you will use all 8 GPUs on that node

-t 1:00:00 requests one hour of running time

myscript.job is your batch script.


Here is an example batch script intended to run on one Apollo server, using all eight Volta 100 GPUs. This script specifies the same sbatch directives as the sbatch command above. You can specify directives either way, but directives on the command line take precedence over those in a batch script.

Note we are using the bash shell and have included the command to load the module command.

#SBATCH --partition=GPU-AI #SBATCH --nodes=1 #SBATCH --gres=gpu:volta16:8 #SBATCH --time=1:00:00 
source /etc/profile.d/
 cd $SCRATCH/ngc module load singularity singularity exec --nv $SCRATCH/tensorflow.simg $SCRATCH/ngc/matrix.s


Environment variables

Using environment variables can make your life easier. Defining one variable to be the file path for the image you want to use and another to run that Singularity image can make it easier to access those strings later. For example, if you wish to use the tensorflow 18.10-py3 image, define a variable SIMG with the command:


Then define another environment variable that will run the Singularity image using NVIDIA optimizations:

S_EXEC="singularity exec --nv ${SIMG}"

Assuming that you have defined SIMG and S_EXEC as shown above, a sample sbatch command to request the use of 1 GPU from the DGX-2 node would be:

sbatch -p GPU-AI --gres=gpu:volta32:1 ${S_EXEC} myscript.job


-p indicates the intended partition

–gres=gpu:volta32:1  requests the use of 1 V100 GPU on the DGX-2 

myscript.job is the name of your batch script


Example scripts

 Example scripts using Tensorflow are available on Bridges in /opt/packages/examples/tensorflow/AI. 


See also

Machine Learning Tutorial

A tutorial on machine learning using a Jupyter notebook is available here.



Technical support can be reached at: