The Software Challenge of PetaScale, an Interview with PSC's Assistant Director, Sergiu Sanielevici
This interview originally appeared in the January 30, 2004 issue of HPCwire.
HPCwire editorial content is considered to be copyrighted material may not be reprinted or published in any form without the prior written consent of the publisher. For a free trial subscription to HPCwire: email@example.com
by Alan Beck, Editor-in-Chief HPCwire
Following is Alan Beck’s interview of Sergiu Sanielevici, assistant director at Pittsburgh Supercomputing Center, regarding his views on the future of supercomputing.
HPCwire: The path ahead as HPC looks toward the 2010 horizon calls for Petaflop systems, which most likely will entail massive parallelism on the order of thousands of processors. How is this likely to affect software?
SS: You’re right. All the petascale system architectures currently on the drawing board are massively parallel. These systems will bring to bear many thousands of processors on each computation in order to exploit the scale of the system. There’s more than scale alone, however, that’s involved.
The processors will be fed by complex, multi-stage memory hierarchies, and for these systems to be productive — to deliver sustained application performance in an acceptable relation to peak — they will require programs that can efficiently manipulate this kind of complex memory hierarchy.
At PSC, as we’ve worked with research teams to gain high levels of productivity with LeMieux, the NSF terascale system, we’ve begun to learn how to think about software at these scales. Among the concerns, for instance, is the stringent performance requirements that will be involved with petascale I/O. With so many fallible components in the system, application codes need to do their own checkpointing, which requires careful attention at the application level, something which is not faciliated by traditional programming models, in order to save terabytes of data over a complex network in minutes. The system solutions devised by architects to meet these I/O requirements will need to be efficiently yet portably exploited by petascale programming models.
HPCwire: We’re hearing from various places, such as DARPA’s High Productivity Computing Systems (HPCS) program, that the present moment represents a “critical juncture” for U.S. supercomputing. Do you agree?
SS: Yes, I do, and it’s certainly just as true for software as hardware. The two go hand-in-hand, and the necessity of coordination between the two becomes much more important at the petascale, which is why we’re at a critical juncture and why we should address it now and, hopefully, get a jump on thinking clearly about the challenges involved. At PSC, our experience with the current crop of systems capable of terascale performance suggests that almost all leading-edge scientific codes will have to be re-engineered to some extent. The sooner as a community we begin to pay attention to this, obviously, the farther ahead we’ll be.
The problem is that most parallel codes follow the message-passing and/or the shared-memory programming model, both of which have drawbacks that become critical at the scale and complexity we’re now reaching. Shared memory doesn’t scale, is expensive even at small scales, and doesn’t encourage data locality. Message passing is too low-level, too burdensome to the programmer, and too hard to optimize when the number of communicating objects grows into the thousands. Efficient, portable parallel I/O and checkpointing (as well as visualization and steering) are seldom incorporated into existing codes.
In the 30 years or so since supercomputing has emerged, it has been possible for physicists, chemists, biologists, engineers and others to make progress mainly using the numerical and programming expertise within their own groups (including computer-savvy graduate students and postdocs). They could more or less get by with a minimalist reaction to changes in the architectures they have been presented with — from the CDC-7600 via the vector era to the 1990’s style of more or less “massive” parallel systems. But now, the complexity of the system is such that you’re much more likely to succeed if you can adapt the advanced methods developed by numerical mathematicians and computer scientists to your specific needs.
HPCwire: What new software approaches specifically do you envision?
SS: I think we need to consider innovations at several levels: languages and compilers, runtime systems, frameworks and tools, for both computation and I/O. There has been solid progress by computer scientists and numerical mathematicians over the past few years, with the explicit goal of facilitating development of application codes that will perform efficiently at any scale. Many of these methods have already been successfully demonstrated on real codes achieving excellent performance on today’s terascale systems.
For example, certain dialects of Fortran, C and Java provide a global memory space abstraction whereby all data has a user-controllable processor affinity, but parallel processes may directly reference each other’s memory. I’m thinking of, respectively, Co-Array Fortran (CAF), Unified Parallel C (UPC), and U.C. Berkeley’s Titanium.
At the runtime level, UIUC’s Charm++, a parallel C++ library, and AMPI, an adaptive MPI implementation, provide processor virtualization. This technique allows the programmer to divide the computation into a large number of entities that are mapped to the available processors by an intelligent runtime system, enabling a separation of concerns that leads to both improved productivity and higher performance. As another example, UCSD’s KeLP programming system enables the programmer to express complicated dependence patterns in geometric terms. On top of such a programming system one can then build domain-specific solver libraries, such as SCALLOP for elliptic partial differential equations.
At the highest level, there are collections of tools available to the applications programmer. For example, the DOE Advanced CompuTational Software (ACTS) collection offers direct and iterative methods for the solution of linear and non-linear systems of equations; partial differential equation solvers and multi-level methods; structured and unstructured meshes (generation, manipulation and computation); as well as performance monitoring and tuning. PSC has developed tools for I/O and checkpointing on large-scale parallel and distributed systems.
HPCwire: Will this require a radical reprogramming effort in applications?
SS: It’s going to depend. As I’ve said, these new approaches are available at several levels: source, link, and executable components. So the scientific applications teams will need to study and evaluate various options depending on their needs and means, the fit with their own existing codes, the performance, reliability and maintainability of the new external methods and components, etc. Clearly everyone will want to get the maximum benefit with the minimum investment of time and resources.
HPCwire: What kind of applications will be affected by these changes?
SS: It’s easier to think of types of applications that will probably *not* need to be changed. “Pleasingly parallel” and parameter sweep (ensemble simulation) applications should be fine regardless of the complexity of the communications fabric and the number of processors — assuming the single-processor or low-parallelism “elementary” code is kept well tuned as new nodes are deployed. Also, there certainly exist beautifully scalable and efficient terascale codes painstakingly crafted using “traditional” programming paradigms and tools, which may remain satisfactory into the petascale regime. But, in general, any scientist who plans to work at the petascale should critically examine the possibility that a break with the past may be needed.
HPCwire: What role will computer scientists play in these changes?
SS: That’s a good question, because the problems we’re talking about aren’t purely technical. We also need to be thinking about the sociological situation. As a community, we need to do a better job of bridging the gap between academic computer scientists and the computational scientists whose application codes need to be re-engineered to petascale requirements. Computer scientists need to understand, in detail, the concerns and the outlook of the application developers. Sometimes, these are not necessarily the “coolest” issues a computer scientist would want to work on: things like long-term maintenance and user support, documentation written in intelligent layperson’s language, or quickly adding functionality that would not be a priority from a pure computer science viewpoint. Work that helps a biophysicist publish breakthrough papers may not produce any significant publications for her computer scientist partner, and vice versa.
We also need these same computer scientists to collaborate with the commercial vendors who will design and implement the systems that will reach the petascale, and with the supercomputing centers that will deploy and operate them. This gap-bridging approach is being pioneered by several initiatives including the NSF TeraGrid project and DARPA’s HPCS program.
HPCwire: These seem to be daunting problems. How do you propose to attack them?
SS: I think the first step is to engage the computational science community in a process of critically examining how they will operate at the petascale, and in a close and mutually beneficial collaboration with the computer scientists who are interested in petascale systems. At PSC, together with the other NSF PACI centers and the DOE centers at ORNL and NERSC, we made a start in 2002 by organizing the workshop “Scaling to New Heights,” where members of both communities discussed their experiences and ideas.
This year, NSF, DOE and DoD are sponsoring a tightly focused workshop that aims to introduce computational scientists and engineers to the new software approaches we’ve discussed, and to their developers. This will take place at PSC on May 3 and 4, 2004. We certainly suggest that everyone who plans to do computational science on the upcoming generation of platforms consider participation.
Details and the registration form can be found at: http://www.psc.edu/training/PPS_May04/
What I’d like to see is that this workshop and others like it will produce a series of “grassroots” collaborations, in which people can generate some sense of shared purpose in making petascale computing work. I’ve found that the funding agencies are keenly interested in this topic and enthusiastically support our plans for this workshop. If we in the U.S. computational science and HPC community hope to convince the nation to invest the considerable dollars needed to keep us moving forward, we should demonstrate that we are doing all we can to maximize the returns in scientific and technological breakthroughs.
More information: New Methods for Developing Peta-scalable Codes
© Pittsburgh Supercomputing Center.