Bridges-2

Bridges-2 maintenance October 30 - November 1, 2024

Bridges-2 will be unavailable beginning at 8:00AM Eastern time on Wednesday, October 30 in order to install and test new hardware to add capacity to both the Bridges-2 Ocean and Jet filesystems. Maintenance is scheduled to end at noon on Friday, November 1 but we hope to complete the upgrade and return the system earlier than that if possible.

During the outage, Bridges-2, including all VMs and filesystems, will be unavailable.

PSC and vendor staff have been hard at work, trying to determine the cause of recent issues. We believe the machine is functioning properly for most use cases at the moment, but there is still work to be done. We have scheduled this maintenance with that in mind, and in the hopes of mitigating future problems.

Please direct any questions to help@psc.edu and our team will be happy to assist you.

Bridges-2, PSC’s flagship supercomputer, began production operations in March 2021. It is funded by a $10-million grant from the National Science Foundation.

Bridges-2 provides transformative capability for rapidly evolving computation- and data-intensive research, and creates opportunities for collaboration and convergence research. It supports both traditional and non-traditional research communities and applications. Bridges-2 integrates new technologies for converged, scalable HPC, machine learning and data; prioritizes researcher productivity and ease of use; and provides an extensible architecture for interoperation with complementary data-intensive projects, campus resources, and clouds.

Bridges-2 is available at no cost for research and education, and at cost-recovery rates for other purposes.

Core concepts and innovation

Bridges-2 was designed around the core concepts of converged HPC + AI + Data; a custom topology optimized for data-centric HPC, AI, and HPDA; heterogeneous node types for different aspects of workflows, including both CPUs and AI-targeted GPUS; 3 tiers of per-node RAM: 256GB, 512GB, and 4TB; an extremely flexible software environment; and community data collections and Big Data as a Service.

Innovative features built into Bridges-2 to facilitate these core concepts include:

  • AMD EPYC 7742 CPUs: 64-core2.25–3.4 GHz
  • AI scaling to 192 V100-32GB SXM2 GPUs
  • 100TB, 9M IOPs flash array accelerates deep learning training, genomics, and other applications
  • Mellanox HDR-200 InfiniBand doubles bandwidth & supports in-network MPI-Direct, RDMA, GPUDirect, SR-IOV, and data encryption
  • Cray ClusterStor E1000 Storage System
  • HPE DMF single namespace for data security and expandable archiving

Questions?

If you have questions about Bridges-2, please email help@psc.edu.

Bridges-2 Simulations Uncover Better Ways to Inject Monoclonal Antibody Drugs

Detailed Model of Movement of Antibodies through Tissues Reveals Ways to Improve Autoinjectors

Enhancing Hurricane Forecasts: A Game-Changer in Lessening Catastrophic Impacts

Simulations on Bridges-2 Reveal that Reducing Estimates of Atmospheric Friction Improves Storm Predictions

Dune Software Helps Scientists Identify Roles of Individual Cells

By Optimizing Cell Classification, the New Tool Running on Bridges-2 Promises Better Understanding of How Individual Cells Help Organs and Tissues Function

Spatial Relationships Help AI Learn Without Human Help

Environmental Spatial Similarity Approach with Bridges-2 Improves Speed and Accuracy for More Flexible Visual, General AI Learning

$4.9-Million NSF Award Funds Major Enhancement to Bridges-2 System

Additional GPU Capabilities Will Expand AI Research in Range of Scientific Fields

Dana O’Connor – MSC Senior Rookie Awardee

Dana O’Connor, Machine Learning Research Scientist, talks about her recent Senior Rookie award and her work at PSC.

Node types

Regular Memory nodes

Regular Memory (RM) nodes provide extremely powerful, general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing.

488 RM nodes have 256GB of RAM, and 16 have 512GB of RAM.

Details
All 504 RM nodes have:

  • NVMe SSD (3.84TB)
  • Mellanox ConnectX-6 HDR Infiniband 200Gb/s Adapter
  • Two AMD EPYC 7742 CPUS, each with:
    • 64 cores
    • 2.25-3.40GHz
    • 256MB L3
    • 8 memory channels

Extreme Memory nodes

Extreme Memory (EM) nodes provide 4TB of shared memory for genome sequence assembly, graph analytics, statistics, and other applications requiring a large amount of memory for which distributed-memory implementations are not available.

Details

Each of Bridges-2’s 4 EM nodes consists of:

  • 35.75MB LLC 6 memory channels
  • 4TB of RAM: DDR4-2933
  • NVMe SSD (7.68TB)
  • Mellanox ConnectX-6 HDR InfiniBand 200Gb/s Adapter
  • Four Intel Xeon Platinum 8260M “Cascade Lake” CPUs:
    • 24 cores
    • 2.40–3.90GHz

GPU nodes

GPU nodes provide exceptional performance and scalability for deep learning and accelerated computing. Bridges-2 launched with 24 GPU nodes, totaling 40,960 CUDA cores and 5,120 tensor cores. The GPU nodes from Bridges were later migrated to Bridges-2, adding the DGX-2 and nine more V100 GPU nodes to Bridges-2’s GPU  resources.

Details

The 24 original Bridges-2 GPU nodes contain:

  • 512GB of RAM: DDR4-2933
  • 7.68TB NVMe SSD
  • Two Mellanox ConnectX-6 HDR Infiniband 200Gb/s Adapter
  • Eight NVIDIA Tesla V100-32GB SXM2 GPUs
  • 1 Pf/s tensor
  • Two Intel Xeon Gold 6248 “Cascade Lake” CPUs:
    • 20 cores, 2.50–3.90GHz, 27.5MB LLC, 6 memory channels

The Bridges GPUs migrated to Bridges-2 include nine nodes with:

  •  192GB of RAM, DDR4-2666
  • 4 NVMe SSDs, 2TB each
  • 8 NVIDIA V10016GB GPUs
  • 2 Intel Xeon Gold 6148 CPUs:
    • 20 cores/CPU, 2.4 – 3.7 Ghz

The DGX-2 has:

  • 1.5TB RAM, DDR4-2666
  • 8 NVMe SSDs, 8.84TB each
  • 16 NVIDIA Volta V100-32GB GPUs
  • 2 Intel Xeon Platinum 8168 CPUs:
    • 24 cores/CPU, 2.7 – 3.7 GHz