$7.6-Million NSF Grant to fund the Data Exacell, PSC’s Next-Generation System for Storing, Analyzing Big Data
The term “Big Data” has become a buzzword. Like any buzzword, its definition is fairly malleable, carrying different meanings in research, technology, medicine, business and government.
One common thread, though, is that Big Data represents volumes of data that are so large that they are outgrowing the available infrastructure for handling them. In many cases, research can’t be done because the tools don’t yet exist for managing and analyzing the data in a reasonable amount of time. Ultimately, we need to develop both tools and an overall strategy to make Big Data fulfill its promise in fields as disparate as biomedicine, the humanities, public health, astronomy and more.
PSC is taking the next step in developing both tools and direction for harnessing Big Data. A new National Science Foundation (NSF) grant will fund a PSC project to develop a prototype Data Exacell (DXC), a next-generation system for storing, handling and analyzing vast amounts of data. The $7.6-million, four-year grant will allow PSC to design, build, test and refine DXC in collaboration with selected scientific research projects that face unique challenges in working with and analyzing Big Data.
“We are very pleased with this opportunity to continue working cooperatively to advance the state of the art based on our historical strengths in information technologies,” says Subra Suresh, the president of Carnegie Mellon University. “The Data Exacell holds promise to provide advances in a wide range of important scientific research,” says Mark Nordenberg, chancellor of the University of Pittsburgh.
Big Data is a broad field that encompasses both traditional high-performance computing and also other fields of technology and of research. But these fields increasingly share a focus more on data collection and analysis—handling and understanding unprecedented amounts of data— than on computation. They also require access methods and performance beyond the capability of traditional large data stores. The DXC project will directly address these required enhancements.
“The focus of this project is Big Data storage, retrieval and analysis,” says Michael Levine, PSC scientific director. “The Data Exacell prototype builds on our successful, innovative activities with a variety of data storage and analysis systems.”
The core of DXC will be SLASH2, PSC’s production software for managing and moving data volumes that otherwise would be unmanageable.
“What’s needed is a distributed, integrated system that allows researchers to collaboratively analyze cross-domain data without the performance roadblocks that are typically associated with Big Data,” says Nick Nystrom, director of strategic applications at PSC. “One result of this effort will be a robust, multifunctional system for Big Data analytics that will be ready for expansion into a large, production system.”
DXC will concentrate primarily on enhancing support for data-intensive research. PSC external collaborators from a variety of fields will work closely with the center’s scientists to ensure the system’s applicability to existing problems and its ability to serve as a model for future systems. The collaborating fields are expected to include genomics, radio astronomy, analysis of multimedia data and other fields. (See below.)
“The Data Exacell will have a heavy focus on how the system will be used,” says J. Ray Scott, PSC director of systems and operations. “We’ll start with a targeted set of users who will get results but who are experienced enough to help us work through the challenges of making it production quality.”
In November, PSC received top national honors in four categories of the 2013 HPCwire Readers’ and Editors’ Choice Awards. HPCwire, the premier trade publication for the high-performance computing (HPC) community, announced the winners at the 2013 International Conference for High Performance Computing, Networking, Storage and Analysis (SC13), in Denver, Colorado.
PSC received:
• Reader’s Choice, Best use of HPC in Life Sciences, for work on PSC’s Blacklight supercomputer in overcoming limitations in complex DNA and RNA sequencing tasks, identifying expressed genes in nonhuman primates, petroleum- digesting soil microorganisms and bacterial enzymes that may help convert non-food crops into usable biofuels.
• Reader’s Choice, Best use of HPC in “Edge” HPC Application, for the VecNet Cyberinfrastructure (CI). A collaboration between PSC’s Public Health Group and Notre Dame’s Center for Research Computing is building a computational system that will enable VecNet—a partnership of academic and industrial researchers, local public health officers and foundation and national decision makers—to test ideas for eradicating malaria before applying them in the real world.
• Editors’ Choice, Best Application of “Big Data” in HPC, for PSC’s newest supercomputing resource, Sherlock. Specially designed to solve what are known as graph problems, Sherlock is optimized for questions involving complex networks that can’t be understood in isolated pieces. Topics range from cancer protein and gene interactions to performing smarter information retrieval in complex documents such as Wikipedia.
• Editors’ Choice, Best use of HPC in Financial Services, for research that led to a change in trader reporting requirements to the New York Stock Exchange and NASDAQ. Work on Blacklight enabled investigators to prove that high-volume automated traders were exploiting reporting rules to make “invisible” trades that manipulated the markets.
Two public health projects at PSC have also made HPCwire’s list of “The Top Supercomputing-Led Discoveries of 2013.” The HERMES project is analyzing public-health supply chains in lower-income countries to identify and repair under-appreciated choke points in vaccine supply efforts, for example. The VecNet Cyberinfrastructure project has created a prototype computational system to support a global malaria eradication effort (see below for more).
HPCwire named the two PSC projects among 30 supercomputing projects chosen from their new archives and which the publication believes are “set to change the world in 2014 and beyond.”
In addition to winning an HPCwire award and being cited as one of the most significant supercomputing discoveries of the year (see above), VecNet CI has also completed its prototype of a computational system for university, industry, government and funding entities to test ideas for eradicating malaria.
“After our first year of development, we have a successful prototype framework of all user tools,” says Nathan Stone, principal investigator of the infrastructure project at PSC. “Now, direct engagement with stakeholders via workshops, tutorials and online demonstrations will be important to refine these tools and get them into the hands of those who can best use them.”
The final quarter of calendar 2013 saw the completion of the prototype CI framework, which consists of four major tools. These allow users to test existing and new malaria eradication methods, investigate malaria risk factors, plan detailed intervention campaigns and assess economic impact.
To unite these user tools, the group developed a common web interface with supporting access to a digital library (for archiving data sources and provenance), compute clusters (for running the disease forecasting models) and a data warehouse (for interactive access and analysis of calculated results). The web site and tools are now in use by almost 150 users worldwide from a variety of disciplines.
“Vecnet’s work in 2014 will emphasize improvements to the malaria transmission models, and the expansion of calibrated input data to cover new geographic regions of interest to stakeholders,” Stone says.
PSC Takes Lead in XSEDE Summer Research Experience Program
Young career seekers in the high-performance computing (HPC) field often face a familiar problem. You can’t get the job without experience. But another hitch confronts would-be “super computors” (pun intended). If you want to write software for PCs or smartphones, you probably know what that kind of programmer does. But what does an HPC engineer, researcher or educator do? How does a student find out if HPC is for him or her?
The NSF XSEDE Summer Research Experience Program exists to help students solve both problems, says PSC’s Laura McGinnis, the program’s coordinator. It also prepares and sustains a larger, more diverse pool of undergraduate and graduate students to be future HPC professionals. “Because supercomputing is a niche, we’re providing a hands-on opportunity for students to experience HPC and be able to make an informed decision about their career track,” McGinnis says. “We provide real-world experience in computational science, particularly for underrepresented students.”
The Summer Research Experience Program includes training, internships, fellowships, mentoring and recognition activities. Participating students gain real-world research and development experience as well as encouragement and academic support in their pursuit of advanced degrees and digital services professional careers.
The program distinguishes itself from other internship programs by providing the participants the opportunity to expand their horizons with high-performance computing challenges in all research fields. Working with XSEDE researchers and staff, students gain relevant high-performance computing experience on real-world problems and the opportunity to make meaningful contributions to research, development and systems projects.
“It’s important that the projects be real and not just have the students come in and optimize ‘hello world’ on a thousand processors,” says McGinnis. The program also provides a small stipend and travel support for project orientation and attendance at the XSEDE14 conference in Atlanta, Ga.
PSC plays a central role in the Summer Research Experience Program, both by providing leadership and also by hosting many students in the program’s summer study component. Here are a few of these students and their stories.
Rockets are in Marjorie Ingle’s blood. The University of Texas at El Paso (UTEP) second-year master’s student literally learned rocketry on her grandfather’s knee.
“He was a research and development engineer at White Sands Missile Range,” she says. “He used to give me little models that I would put together. I ‘volunteered’ Barbie for the space program I don’t know how many times,” sending the doll roaring skyward on model rockets.
Ingle’s XSEDE project centered on a phenomenon common to rocket engines as well as high- temperature nuclear reactor cooling systems: understanding how fluids such as rocket fuel or liquid helium coolant behave when they pass through small apertures.
Ingle worked with Pittsburgh Supercomputing Center’s Anirban Jana on a computational fluid dynamics (CFD) model of jets of liquid helium flowing through a reactor cooling system. This system is under study partly because liquid helium coolant is more durable than the water used in older reactor designs and so doesn’t need to be replaced as often, reducing the production of radioactive waste and making the reactor more ecologically friendly.
For Anthony Ruggiero, a junior at Pittsburgh’s Duquesne University, physics always seemed to be a gateway to higher things: find a higher potential, as it were.
“With physics, anything is possible,” he says. “I wanted to do something with my life that people have never done before; I figured physics would allow me to do that.”
Ironically, Ruggiero’s XSEDE project literally focused on potential: the potential energy of an electron in what’s known as the single-site Schrödinger equation.
In the equation named for him Schrödinger, one of quantum mechanics’ founders, created the quantum equivalent of classical conservation of energy—an object’s total energy is its potential energy plus its kinetic energy (the energy of its movement). In the strange quantum world, though, an electron doesn’t have a location per se. Its location is more of a smeared-out cloud of possibility.
Working with PSC’s Yang Wang and Roberto Gomez, Ruggiero worked on speeding up calculations based on Schrödinger’s equation via general-purpose graphics processing units— GPUs. Originally developed to help computers create smoother, more stable visual images, GPUs have proved extremely versatile even for calculations not related to images, such as those in Schrödinger’s equation.
Paula Romero has had some unique educational experiences. When she was 11, the second- year University of Indianapolis undergraduate’s family fled the political instability of Venezuela, where she was born, for the “old country”— her parents’ native Spain. There she entered a teaching system very unlike that of the U.S. “In Spain they focus on teaching theory,” she explains. “They try to keep what is math on one side and what is physics on the other side... relating both fields is mostly your job. There is a lot of sitting down at a desk and studying for hours.”
Conceptually, she says, such a parallelized education was great preparation for the mindset necessary for parallelizing code: pulling problems apart into chunks that can be attacked in parallel, speeding the calculation on computers with many parallel processors.
Under the guidance of PSC’s Yang Wang and Roberto Gomez, she worked with Shawn Coleman, a PhD student at the University of Arkansas, to optimize an x-ray crystallography diffraction algorithm for use in Intel MICs— many-integrated core coprocessors, which speed highly parallel calculations in the Texas Advanced Computing Center’s Stampede supercomputer.
Anton shows how water leaving, re-entering potassium channel structure delays return to active state
The potassium channel helps create electrical signals in nerve and muscle cells. This process goes awry in some irregular heartbeat conditions. To work properly, every nerve or heart cell needs potassium channels that can activate, inactivate and then reset themselves to respond to the next signal.
In the journal Nature, Jared Ostmeyer, Benoît Rouxand colleagues at the University of Chicago reportedsimulations on an Anton supercomputer, developedand provided by D. E. Shaw Research, hosted atPSC and funded by MMBioS, that revealed how the channel pinches off potassium movement.
In the cell, the potassium channel can take as long as 10 to 20 seconds to reset its potassium filter. No computer currently on Earth can carry out such a long molecular dynamics simulation. But the University of Chicago researchers leveraged Anton to push their simulations to 20 microseconds.
Even this relatively brief look was revealing, Roux says. “The system was stable for 20 microseconds in the pinched state,” he says. That’s a long time in molecular dynamics. “That was really shocking; we did not expect it.” But that didn’t mean the structure was static: The water molecules kept coming and going. Clearing the water molecules, and reopening the filter, was a “two steps forward, one step back” process, explaining the system’s slow recovery.
Anton Simulations Reveal How Dangerous Bacteria Install Critical Proteins
Why It’s Important
In an era of diminishing antibiotic effectiveness, it’s no wonder that bacteria, how they live—and what molecular components they can’t live without—are an important focus for biomedical science. This “beta barrel protein” inserts other beta barrel proteins into the outer bacterial membrane, including those that import nutrients or export toxins that kill host cells. The process is a promising target for antibacterial drugs.
How Anton Helped
The researchers revealed BamA’s side exit using molecular dynamics (MD) simulations that lasted from one- to two-millionths of a second. In the world of computational biochemistry, that’s a very long time—supercomputers take months to perform simulations of the necessary length. On Anton, a special-purpose supercomputer designed to dramatically increase the speed of MD simulations, it can be accomplished in a day. The researchers reported their work in the journal Nature.
“Anton was critical for the work,” Gumbart says. “If limited to conventional systems, I probably would have run about 50 to 100 nanoseconds”—a tenth or less the time scale. If he’d only looked at this scale, he says, he might have thought, “Well, I don’t see anything, and that’s what it is.” Anton allowed him to push farther, to a remarkable result.

Events Calendar

<<  November 2018  >>
 Su  Mo  Tu  We  Th  Fr  Sa 
      1  2  3
  4  5  6  7  8  910

User Information

Connect to PSC systems:
Technical questions:

Send mail to remarks@psc.edu or call the PSC hotline: 412-268-6350.