PSC NAIRR FAQ

The Pittsburgh Supercomputing Center  is thrilled to help the national research and education community respond to the NSF call for NAIRR Pilot – NAIRR Pilot Resource Requests to Advance AI ResearchWe’ve compiled this set of frequently asked questions to help clarify this opportunity.

If you have questions not covered here, please contact help@psc.edu.

What is NAIRR?

The National Artificial Intelligence Research Resource (NAIRR) is a concept for a shared national research infrastructure to connect U.S. researchers to responsible and trustworthy Artificial Intelligence (AI) resources, as well as the needed computational, data, software, training, and educational resources to advance research, discovery, and innovation.

The purpose of the NAIRR Pilot is to demonstrate the NAIRR concept and advance its main goals, namely to: spur innovation, increase diversity of talent, improve capacity, and advance safe, secure, and trustworthy AI in research and society. It is informed by an implementation plan for the NAIRR produced in January 2023 by a task force co-chaired by NSF and the White House Office of Science and Technology Policy.

Why should I request a NAIRR Pilot resource allocation?

By participating in the NAIRR Pilot, you have the opportunity to advance your work in the short term, and you will also help to lay the groundwork for the possible infusion of $2.6 billion over a six-year period into our community, as estimated in the NAIRR implementation plan.

What PSC resources are available in the NAIRR pilot?

PSC is contributing several resources to the NAIRR pilot, including the Neocortex and Bridges-2 compute systems, along with their shared Ocean and Jet file systems, rich software resources, and connections to Internet-2, the national high-speed research network.

Neocortex is one of the original six resources available for NAIRR Pilot allocations. Neocortex is a highly innovative advanced computing system ideal for foundation and large language models. It features two Cerebras CS-2 systems, provisioned by an HPE Superdome Flex (SDF) HPC server and the Bridges-2 filesystems. The CS-2 systems can run customized TensorFlow and Pytorch containers, as well as programs written using the Cerebras SDK or the WSE Field Equation API. The SDF enables memory-intensive graph analytics, in-memory databases, statistics, and dataset preparation for large-scale training.

Each CS-2 system features a Cerebras WSE-2 (Wafer Scale Engine 2), the largest chip ever built, with 850,000 Sparse Linear Algebra Compute cores, 40 GB SRAM on-chip memory, 20 PB/s aggregate memory bandwidth and 220 Pb/s interconnect bandwidth.

The SDF features 32 Intel Xeon Platinum 8280L CPUs with 28 cores (56 threads) each, 2.70-4.0 GHz, 38.5 MB cache, 24 TiB RAM, aggregate memory bandwidth of 4.5 TB/s, and 204.6 TB aggregate local storage capacity with 150 GB/s read bandwidth. The SDF can provide 1.2 Tb/s to each CS-2 system and 1.6 Tb/s from the Bridges-2 filesystems.

Bridges-2  combines high-performance computing (HPC), high performance artificial intelligence (HPAI), and large-scale data management to support simulation and modeling, data analytics, community data, and complex workflows.

Bridges-2 GPU nodes are optimized for scalable artificial intelligence (AI) including deep learning training, deep reinforcement learning, and generative techniques. Each GPU node contains 8 NVIDIA Tesla V100-32GB SXM2 GPUs.  

Bridges-2 Regular Memory (RM) nodes provide powerful general-purpose computing, machine learning and data analytics, AI inferencing, and pre- and post-processing. 488 Bridges-2 RM nodes have 256GB RAM, and 16 have 512GB RAM for more memory-intensive applications. Each Bridges RM node consists of two AMD EPYC “Rome” 7742 64-core CPUs, 256-512GB of RAM, and 3.84TB NVMe SSD. 

Bridges-2 Extreme Memory (EM) nodes enable memory-intensive graph analytics, in-memory databases, statistics, genome sequence assembly, and other applications that need a large amount of memory and for which distributed-memory implementations are not available. Each of the four EM nodes provides 4TB of DDR4-2933 RAM, 4 Intel Xeon Platinum 8260M “Cascade Lake” CPUs, and a 7.68TB NVMe SSD.

Since Neocortex and all Bridges-2 nodes are connected to the Ocean and Jet file systems, their resources can be combined to enable complex work and data flows, such as various stages of ML data preparation, training and inference, and/or integrating machine learning with HPC simulation. The Pegasus workflow management system is available for this purpose.

All Bridges-2’s compute nodes and its Ocean parallel file system are connected by two HDR-200 InfiniBand links, providing 400Gbps of bandwidth to enhance the scalability of deep learning training. A 100TB, 9M IOPs flash array file system called Jet is also mounted on all Bridges-2 nodes and can further accelerate deep learning training.

Open OnDemand  is available on both Neocortex and Bridges-2, for interactive applications such as Jupyter Notebooks.

Bridges-2 and Neocortex are connected to the Internet-2 high speed national research network and thus to most other resources in the NAIRR Pilot pool, including providers of models and datasets that may be of interest to projects using PSC compute resources. PSC has partnerships with many other NAIRR Pilot resource providers, and would welcome discussions about projects that could span several resources.

I already have ACCESS allocations on PSC resources. Why should I consider a NAIRR Pilot allocation?

A NAIRR Pilot allocation on PSC resources can complement existing or planned allocations obtained via the ACCESS program. For example, you might identify a subset of your research program that is well aligned with the NAIRR Pilot focus areas and likely to produce successful results within 12 months. This project could progress in parallel with the work your group is doing under ACCESS.

Building upon work you have already done on Bridges-2 or Neocortex will help substantiate your NAIRR Pilot resource request, by referencing publications and performance data obtained on the requested resources for similar or related software and data.

Who is eligible to apply for a NAIRR Pilot allocation?

In general, US-based researchers and educators from US-based institutions are eligible for NAIRR pilot allocations.

Please see the NAIRR Pilot allocations page for complete eligibility information, as well as information on focus areas and details for proposals, expectations, evaluations, and reviews.

 

How do I apply for a NAIRR Pilot allocation?

Please refer to the NAIRR Pilot proposal instructions. Use XRAS Submit – NAIRR – Login to begin the application process. Note that you are required to log in using your ORCID iD. If you do not have one yet, you will be prompted to create one.

Please direct questions to help@allocations.nairrpilot.org .

 

How can I best target my NAIRR Pilot proposal to Neocortex and/or Bridges-2 ?

If you are considering an application for Neocortex resources, we recommend that you review the Neocortex Documentation  and contact neocortex@psc.edu specifying “NAIRR-Pilot Proposal Query” in the subject line.

If you are considering an application for Bridges-2 resources, we recommend that you review the Bridges-2 User Guide and contact help@psc.edu with any questions, specifying “NAIRR-Pilot Proposal Query” in the subject line.