System Configuration
Bridges-2 is designed for converged HPC + AI + Data. Its custom topology is optimized for data-centric HPC, AI, and HPDA (High Performance Data Analytics). An extremely flexible software environment along with community data collections and BDaaS (Big Data as a Service) provide the tools necessary for modern pioneering research. The data management system, Ocean, consists of two-tiers, disk and tape, transparently managed as a single, highly usable namespace.
Compute nodes
Bridges-2 will hold three types of nodes: GPU, “Regular Memory” and “Extreme Memory”.
GPU nodes
Bridges-2’s GPU nodes provide exceptional performance and scalability for deep learning and accelerated computing, with a total of 40, 960 CUDA cores and 5,120 tensor cores.
GPU nodes | |
---|---|
Number | 24 |
GPUs per node | 8 NVIDIA Tesla V100-32GB SXM2 |
GPU performance | 1 Pf/s tensor |
CPUs | 2 Intel Xeon Gold 6248 “Cascade Lake” CPUs 20 cores, 40 threads/CPU 2.50 – 3.90 GHz |
RAM | 512GB, DDR4-2933 |
Cache | 27.5MB LLC, 6 memory channels |
Node-local storage | 7.68TB NVMe SSD |
Network | 2 Mellanox ConnectX-6 HDR Infiniband 200 Gbs/s Adapters |
Regular Memory nodes
Regular Memory (RM) nodes provide extremely powerful general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing.
RM nodes | ||
---|---|---|
Number | 488 | 16 |
CPU | 2 AMD EPYC 7742 CPUs 64 cores, 128 threads each 2.25-3.40 GHz |
2 AMD EPYC 7742 CPUs 64 cores, 128 threads each 2.25-3.40 GHz |
RAM | 256GB | 512GB |
Cache | 256MB L3, 8 memory channels | 256MB L3, 8 memory channels |
Node-local storage | 3.84TB NVMe SSD | 3.84TB NVMe SSD |
Network | Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter | Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter |
Extreme memory nodes
Extreme Memory (EM) nodes provide 4TB of shared memory for genome sequence assembly, graph analytics, statistics, and other applications requiring a large amount of memory for which distributed-memory implementations are not available.
EM nodes | |
---|---|
CPU | 4 Intel Xeon Platinum 8260M “Cascade lake” CPUs 24 cores, 48 threads each 2.40-3.90 GHz |
RAM | 4TB, DDR4-2933 |
Cache | 37.75MB LLC, 6 memory channels |
Node-local storage | 7.68TB NVMe SSD |
Network | Mellanox ConnectX-6-HDR Infiniband 200Gb/s Adapter |
Data Management
Data management on Bridges-2 is accomplished through a unified, high performance filesystem for active project data, archive, and resilience, named Ocean.
Ocean consists of two tiers, disk and tape, transparently managed as a single, highly usable namespace.
Ocean’s disk subsystem, for active project data, is a high performance, internally resilient Lustre parallel filesystem with 15PB of usable capacity, configured to deliver up to 129GB/s and 142GB/s of read and write bandwidth, respectively.
Ocean’s tape subsystem, for archive and additional resilience, is a high performance tape library with 7.2PB of uncompressed capacity, configured to deliver 50TB/hour. Data compression occurs in hardware, transparently, with no performance overhead.