Blacklight & The Data Supercell

The world’s largest shared-memory supercomputer, PSC’s Blacklight, has helped to open XSEDE resources to many non-traditional HPC projects

Times are changing for high-performance computing (HPC) research, as fields of study that haven’t traditionally used HPC have begun taking advantage of these powerful tools. This is especially true for PSC’s Blacklight, an SGI® Altix® UV1000 system acquired in July 2010, with help from a $2.8 million award from the National Science Foundation. As the largest shared-memory system in the world, Blacklight has opened new capability for U.S. scientists and engineers.

“Blacklight has opened new doors to high-performance computation,” said PSC scientific directors Michael Levine and Ralph Roskies, “and rapidly become a force across a wide and interesting spectrum of fields.”

This was part of the plan for the NSF’s XSEDE (Extreme Science and Engineering Discovery Environment) program, which launched in July 2011. The program this year took large steps toward this objective, with a number of non-traditional projects — the common denominator being the need to process and analyze large amounts of data — using XSEDE resources, especially Blacklight, to arrive at new insights.

JKasdorf.jpg

Jim Kasdorf, who joined PSC scientific directors Michael Levine and Ralph Roskies in writing the proposal that established PSC, is PSC’s director of special projects, involved in planning and coordination of many PSC initiatives.

Among these, described in this booklet, are work that analyzes huge quantities of finance-trading data to arrive at important new findings concerning non-beneficial effects of computer trading of stocks. Several projects in assembly and analysis of “next-generation” sequence data have found that Blacklight’s shared memory is uniquely well suited to advance work in this field.

Shared memory offers a large advantage for many data-intensive applications because all of the system’s memory can be directly accessed from all of its processors, as opposed to distributed memory (in which each processor’s memory is directly accessed only by that processor). Because all processors share a single view of data, a shared-memory system is, relatively speaking, easy to program and use.

The Data Supercell

DSC

PSC this year deployed a disk-based file repository and data-management system, the Data Supercell (DSC). This innovative technology provides significant advantages over tape-based archiving. The PSC development team — Paul Nowoczynski, Jared Yanovich, Zhihui Zhang, Jason Sommerfield, J. Ray Scott, and Michael Levine — exploited increasing cost-effectiveness of commodity disk technologies, and adapted sophisticated PSC file-system software (called SLASH2) to use with DSC. A patent application is under review.

“The Data Supercell is a unique technology, building on the cost-effectiveness of disk and the capabilities of PSC’s SLASH2 file system,” said Michael Levine and Ralph Roskies, PSC scientific directors. “It enables more efficient, flexible analyses of very large-scale datasets.”

Intended especially to serve users of large scientific datasets, such as many XSEDE researchers, the DSC’s initial capacity, four petabytes, can be expanded as needed. In comparison with tape-based archiving, DSC facilitates very fast data transfer (latency 10,000 times less than and bandwidth many times more than tape), while it also incorporates high reliability and security.

Departments at the University of Pittsburgh, Carnegie Mellon and Drexel are now using DSC, and researchers with large genomic datasets, produced through Galaxy, a web-based platform for bioinformatics at Penn State, are currently using 470 terabytes of DSC storage.



Creating National Cyberinfrastructure

As a leading partner in XSEDE, the most powerful collection of integrated digital resources and services in the world, PSC helps to shape the vision and progress of U.S. science and engineering

Through XSEDE, the Extreme Science and Engineering Discovery Environment, the NSF cyberinfrastructure program that launched in July 2011, PSC extends its active role in the development of national cyberinfrastructure. PSC scientific co-director Ralph Roskies is a co-principal investigator of XSEDE and co-leads its Extended Collaborative Support Services (ECSS). “ECSS staff work both with user groups in fields familiar with high-performance computing,” says Roskies “and with the XSEDE outreach team to reach user groups, communities and digital services that are new to HPC.”

Other PSC staff lead many areas of the comprehensive XSEDE program. Janet Brown, who manages PSC’s network research, leads the XSEDE Systems and Software Engineering team that oversees the software environment that integrates resources among many providers. As manager of XSEDE Outreach Services, PSC manager of education, outreach and training Laura McGinnis leads programs that help to prepare the next generation of computational scientists.

PSC’s security officer, Jim Marsteller, is the Incident Response Lead for XSEDE. Wendy Huntoon, PSC director of networking, is XSEDE networking liaison for the software development and iteration office. Ken Hackworth, PSC’s user relations coordinator, leads the XSEDE allocations process by which research proposals are reviewed and evaluated to receive grants of computational time on XSEDE resources. PSC scientist Sergiu Sanielevici, director of scientific applications and user support for PSC, leads the Novel and Innovative Projects area of XSEDE’s ECSS effort, which focuses on development of projects in fields or from institutions and communities that can exploit advanced computing but haven’t traditionally used it.

PHOTO: TG

PSC's directors (l to r), who oversee day-to-day PSC operations and help to coordinate PSC’s role in XSEDE: Nick Nystrom, director, strategic applications; Sergiu Sanielevici, director, scientific applications & user support; Bob Stock, PSC associate director; David Kapcin, director of financial affairs; Wendy Huntoon, director of networking; David Moses, executive director; J. Ray Scott, director, systems & operations. (Not pictured here, Cheryl Begandy, director of education, outreach and training.)

© Pittsburgh Supercomputing Center, Carnegie Mellon University, University of Pittsburgh
300 S. Craig Street, Pittsburgh, PA 15213 Phone: 412.268.4960 Fax: 412.268.5832

This page last updated: November 09, 2012