Nicholas A. (Nick) Nystrom, PhD
Sr. Director of Research, Pittsburgh Supercomputing Center
Research Physicist, Dept. of Physics, Carnegie Mellon University
300 South Craig St.
Pittsburgh, PA 15213
I lead the scientific research teams of PSC, including the User Support for Scientific Applications, Biomedical, and Public Health Applications groups, as well as a core team targeting converged HPC and Big Data production resources, strategic applications, allocations, and project management. Together, we engage in research and collaborations across diverse disciplines, develop hardware and software architectures to enable groundbreaking research, and facilitate researchers' use of PSC's HPC and Big Data resources to achieve breakthroughs in their respective fields.
Data analytics, Big Data, causal modeling, graph algorithms, genomics, machine learning (in particular, deep learning), extreme scalability, hardware and software architecture, software engineering for HPC, performance modeling and prediction, impacts of programming models and languages on productivity and efficiency, information visualization, and quantum chemistry. Recent work has focused on enabling data-intensive research in domains new to HPC, scaling diverse computational science codes and workflows to extreme-scale systems, deep hierarchies of parallelism, advanced filesystems, and architectural innovations in processors and interconnects.
- Data Scientist (to be posted soon)
- Undergraduate internships in Data Science and Big Data
- Bridges: A uniquely capable data-intensive HPC system designed to empower new research communities, bring desktop convenience to HPC, expand campus access, and help researchers facing challenges in Big Data to work more intuitively. Called Bridges, the new system will consist of tiered, large-shared-memory resources with nodes having 12TB, 3TB, and 128GB each, dedicated nodes for database, web, and data transfer, high-performance shared and distributed data storage, Hadoop acceleration, powerful new CPUs and GPUs, and a new, uniquely powerful interconnect. Bridges will feature persistent database and web services to support gateways, collaboration, and new levels of access to data repositories. Bridges will also support a high degree of interactivity, gateways and tools for gateway-building, a very flexible user environment including widely-used software such as R, Java, Python, and MATLAB, and virtualization for hosting application-specific environments, enhancing reproducibility, and interoperating with clouds. Bridges will be a resource on XSEDE, NSF's Extreme Science and Engineering Discovery Environment, and will connect with other computational resources, data resources, and scientific instruments.
- Nystrom, N. A., Levine, M. J., Roskies, R. Z., and Scott, J. R. 2015. Bridges: A Uniquely Flexible HPC Resource for New Communities and Data Analytics. In Proceedings of the 2015 Annual Conference on Extreme Science and Engineering Discovery Environment (St. Louis, MO, July 26-30, 2015). XSEDE15. ACM, New York, NY, USA. DOI=http://dx.doi.org/10.1145/2792745.2792775.
- Data Exacell (DXC): A research pilot project to create, deploy, and test software and hardware building blocks to enable data analytics in scientific research. The DXC couples analytic resources with innovative storage to allow communities that traditionally have not used HPC to create valuable new applications. Pilot applications in data-intensive research areas such as genomics, radio astronomy, the digital humanities, biological imaging, and workflow management are being used to motivate, test, demonstrate, and improve the DXC building blocks including, for example, extensions to the SLASH2 filesystem, development of new tools for efficient data transfers and leveraging SDN, and enabling sophisticated, distributed software architectures for novel research domains through integration of database, web, and analytic technologies.
- Center for Causal Discovery (CCD): An NIH BD2K Center of Excellence for Big Data Computing, the CCD aims to develop highly efficient causal discovery algorithms that can be practically applied to very large biomedical datasetsConduct projects addressing three distinct biomedical questions (cancer driver mutations, lung fibrosis, brain causome) as a vehicle for algorithm development and optimization.
- Lu, S., Lu, K. N., Cheng, S.-Y., Hu, B., Ma X., Nystrom, N., and Lu, X. 2015. Identifying Driver Genomic Alterations in Cancers by Searching Minimum-Weight, Mutually Exclusive Sets. PLoS Comput Biol 11(8): e1004257. doi:10.1371/journal.pcbi.1004257.
- Big Data for Better Health (BD4BH) is applying machine learning to large-scale clinical and biomedical data to improve integration, analysis, and modeling for lung and breast cancer. Inferring informative biological patterns from gene sequence and expression data, image data, and clinical records can lead to improved prediction of clinical outcomes, for example, tumor metastasis, and thereby lead to improvements in clinical care.
- Deciphering Cellular Signaling Systems by Deep Mining a Comprehensive Genomic Compendium: In this project led by Xinghua Lu (Univ. of Pittsburgh DBMI), we aim to reveal major cellular signals that regulate gene expression under physiological and pathological conditions and to infer the organization of signals in the human cellular signal transduction system (CSTS). Combining the identified signals with genomic alteration data and drug response data, we aim to further identify pathways underlying disease such as cancers, to use the genomic data to predict drug sensitivity of cancer cell lines, and to predict patient clinical outcomes in a pathway-centered manner.
- Pittsburgh Genome Resource Repository (PGRR): The Pittsburgh Genome Resource Repository provides data management and computing infrastructure to support use of national genome data resources for personalized medicine research. PGRR provides a mechanism for University of Pittsburgh investigators to access and use these datasets from a virtualized central location using common tools and platforms.
Some Other Recent Projects
- Enabling Productive, High-Performance Data Analytics: Sherlock, a YarcData Urika™ graph appliance with PSC-specific customizations, is the first system of it kind made available to the NSF research community. Sherlock is a massively multithreaded supercomputer built on Cray Threadstorm 4.0 processors which implement multiple, powerful features to support lightweight multithreading, latency hiding, and advanced memory interfaces with AMD HyperTransport-attached SeaStar2 interconnect chips to provide a flat, globally-addressable memory. Optimized RDF and SPARQL implementations provide ease of use for large-scale graph analytics, and the system is also programmable using C/C++ and threads.
- Exploring the Potential of “Native Client” for Computational Science: This project tests the usability and affordability of a new form of cloud service for providing more cost effective computational resources for certain types of research computations. A usability component of the test will determine the ease with which applications can be ported to the NaCl programming model.