System Configuration

Bridges-2 is designed for converged HPC + AI + Data.  Its custom topology is optimized for data-centric HPC, AI, and HPDA (High Performance Data Analytics). An extremely flexible software environment along with community data collections and BDaaS (Big Data as a Service) provide the tools necessary for modern pioneering research. The data management system, Ocean, consists of two-tiers, disk and tape, transparently managed as a single, highly usable namespace.

Compute nodes

Bridges-2 will hold three types of nodes: GPU, “Regular Memory” and “Extreme Memory”.

 

GPU nodes

Bridges-2’s GPU nodes provide exceptional performance and scalability for deep learning and accelerated computing, with a total of 40, 960 CUDA cores and 5,120 tensor cores.

GPU nodes
Number 24
GPUs per node 8 NVIDIA Tesla V100-32GB SXM2
GPU performance 1 Pf/s tensor
CPUs 2 Intel Xeon Gold 6248 “Cascade Lake” CPUs
20 cores, 40 threads/CPU
2.50 – 3.90 GHz
RAM 512GB, DDR4-2933
Cache 27.5MB LLC, 6 memory channels
Node-local storage 7.68TB NVMe SSD
Network 2 Mellanox ConnectX-6 HDR Infiniband 200 Gbs/s Adapters

 

Regular Memory nodes

Regular Memory (RM) nodes provide extremely powerful general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing.

RM nodes
Number 488 16
CPU 2 AMD EPYC 7742 CPUs
64 cores, 128 threads each
2.25-3.40 GHz
2 AMD EPYC 7742 CPUs
64 cores, 128 threads each
2.25-3.40 GHz
RAM 256GB 512GB
Cache 256MB L3, 8 memory channels 256MB L3, 8 memory channels
Node-local storage 3.84TB NVMe SSD 3.84TB NVMe SSD
Network Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

 

Extreme memory nodes

Extreme Memory (EM) nodes provide 4TB of shared memory for genome sequence assembly, graph analytics, statistics, and other applications requiring a large amount of memory for which distributed-memory implementations are not available.

EM nodes
CPU 4 Intel Xeon Platinum 8260M “Cascade lake” CPUs
24 cores, 48 threads each
2.40-3.90 GHz
RAM 4TB, DDR4-2933
Cache 37.75MB LLC, 6 memory channels
Node-local storage 7.68TB NVMe SSD
Network Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter

 

Data Management

Data management on Bridges-2 is accomplished through a unified, high performance filesystem for active project data, archive, and resilience, named Ocean.

Ocean consists of two tiers, disk and tape, transparently managed as a single, highly usable namespace.

Ocean’s disk subsystem, for active project data, is a high performance, internally resilient Lustre parallel filesystem with 15PB of usable capacity, configured to deliver up to 129GB/s and 142GB/s of read and write bandwidth, respectively.

Ocean’s tape subsystem, for archive and additional resilience, is a high performance tape library with 7.2PB of uncompressed capacity, configured to deliver 50TB/hour. Data compression occurs in hardware, transparently, with no performance overhead.