Nicholas A. (Nick) Nystrom, PhD
Chief Scientist, Pittsburgh Supercomputing Center
Research Physicist, Dept. of Physics, Carnegie Mellon University
300 South Craig St.
Pittsburgh, PA 15213
Data analytics, Big Data, causal modeling, graph algorithms, genomics, machine learning / deep learning, extreme scalability, hardware and software architecture, software engineering for HPC, performance modeling and prediction, impacts of programming models and languages on productivity and efficiency, information visualization, and quantum chemistry. Recent work has focused on enabling data-intensive research in domains new to HPC, scaling diverse computational science codes and workflows to extreme-scale systems, deep hierarchies of parallelism, advanced filesystems, and architectural innovations in processors and interconnects.
- Bridges: Bridges, a new kind of supercomputer at the Pittsburgh Supercomputing Center (PSC), is designed for unprecedented flexibility and ease of use. Funded by a $17.2M National Science Foundation (NSF) award (ACI-1445606) and allocated through XSEDE, Bridges features large memory (4 compute nodes with 12 TB of RAM, 42 with 3 TB, and 800 with 128 GB) and powerful new Intel® Xeon CPUs and NVIDIA Tesla GPUs for exceptional performance. It also includes database and web servers to support gateways, collaboration, and data management, and it includes 10 PB usable of shared, parallel storage, plus local storage on each of its compute nodes. Bridges was the first production deployment of the Intel® Omni-Path Architecture (OPA) Fabric, which interconnects all of its compute, storage, and utility nodes.
Emphasizing usability and flexibility, Bridges support a high degree of interactivity, gateways and tools for gateway-building, and a very flexible user environment. Widely-used languages and frameworks such as Java, Python, R, MATLAB, Hadoop, and Spark benefit transparently from large memory and the high-performance OPA fabric. Virtualization and containers enable hosting web services, NoSQL databases, and application-specific environments, enhance reproducibility, and support interoperation with clouds.
Bridges’ many GPUs – 32 NVIDIA Tesla K80 GPUs and 64 NVIDIA Tesla P100 GPUs – make it extremely valuable for deep learning and accelerated applications. Deep learning packages such as Caffe, TensorFlow, Theano, and Theano are installed and supported; others can be installed directly by users or by request.
Access to Bridges is available at no charge to the open research community through XSEDE’s proposal processes and by arrangement to industry through PSC’s corporate programs.
- Nystrom, N. A., Levine, M. J., Roskies, R. Z., and Scott, J. R. 2015. Bridges: A Uniquely Flexible HPC Resource for New Communities and Data Analytics. In Proceedings of the 2015 Annual Conference on Extreme Science and Engineering Discovery Environment (St. Louis, MO, July 26-30, 2015). XSEDE15. ACM, New York, NY, USA. DOI=http://dx.doi.org/10.1145/2792745.2792775.
- Nystrom, N. A. A Converged HPC & Big Data Architecture in Production. Invited keynote presentation at HP-CAST 27, Salt Lake City, November 11-12, 2017.
- Data Exacell (DXC): A research pilot project to create, deploy, and test software and hardware building blocks to enable data analytics in scientific research. The DXC couples analytic resources with innovative storage to allow communities that traditionally have not used HPC to create valuable new applications. Pilot applications in data-intensive research areas such as genomics, radio astronomy, the digital humanities, biological imaging, and workflow management are being used to motivate, test, demonstrate, and improve the DXC building blocks including, for example, extensions to the SLASH2 filesystem, development of new tools for efficient data transfers and leveraging SDN, and enabling sophisticated, distributed software architectures for novel research domains through integration of database, web, and analytic technologies.
- Center for Causal Discovery (CCD): An NIH BD2K Center of Excellence for Big Data Computing, the CCD aims to develop highly efficient causal discovery algorithms that can be practically applied to very large biomedical datasetsConduct projects addressing three distinct biomedical questions (cancer driver mutations, lung fibrosis, brain causome) as a vehicle for algorithm development and optimization.
- Lu, S., Lu, K. N., Cheng, S.-Y., Hu, B., Ma X., Nystrom, N., and Lu, X. 2015. Identifying Driver Genomic Alterations in Cancers by Searching Minimum-Weight, Mutually Exclusive Sets. PLoS Comput Biol 11(8): e1004257. doi:10.1371/journal.pcbi.1004257.
- Big Data for Better Health (BD4BH) is applying machine learning to large-scale clinical and biomedical data to improve integration, analysis, and modeling for lung and breast cancer. Inferring informative biological patterns from gene sequence and expression data, image data, and clinical records can lead to improved prediction of clinical outcomes, for example, tumor metastasis, and thereby lead to improvements in clinical care.
- Deciphering Cellular Signaling Systems by Deep Mining a Comprehensive Genomic Compendium: In this project led by Xinghua Lu (Univ. of Pittsburgh DBMI), we aim to reveal major cellular signals that regulate gene expression under physiological and pathological conditions and to infer the organization of signals in the human cellular signal transduction system (CSTS). Combining the identified signals with genomic alteration data and drug response data, we aim to further identify pathways underlying disease such as cancers, to use the genomic data to predict drug sensitivity of cancer cell lines, and to predict patient clinical outcomes in a pathway-centered manner.
Some Other Recent Projects
- Enabling Productive, High-Performance Data Analytics: Sherlock, a YarcData Urika™ graph appliance with PSC-specific customizations, is the first system of it kind made available to the NSF research community. Sherlock is a massively multithreaded supercomputer built on Cray Threadstorm 4.0 processors which implement multiple, powerful features to support lightweight multithreading, latency hiding, and advanced memory interfaces with AMD HyperTransport-attached SeaStar2 interconnect chips to provide a flat, globally-addressable memory. Optimized RDF and SPARQL implementations provide ease of use for large-scale graph analytics, and the system is also programmable using C/C++ and threads.
- Exploring the Potential of “Native Client” for Computational Science: This project tests the usability and affordability of a new form of cloud service for providing more cost effective computational resources for certain types of research computations. A usability component of the test will determine the ease with which applications can be ported to the NaCl programming model.