DATA EXACELL MARKS FIRST YEAR WITH USER-DRIVEN ADVANCES
User input and experience have helped PSC’s Data Exacell (DXC) complete a successful first year of operation. Thanks to the ongoing dialog with users, the DXC team has advanced the system’s software and gained a new understanding of the advanced and novel hardware used to create it.
DXC’s mission is to develop and test new hardware and software technologies as well as system architectures to support data-intensive research. These advances will be used at PSC and made available to the broader community for the benefit of the National Science Foundations’s research-user community. DXC was funded by a major $7.6-million grant from the NSF’s Data Infrastructure Building Blocks program in 2013. It was supplemented by a $1.2-million grant last year to add database capabilities required by the users.
User-motivated improvements have enabled the DXC to grow smoothly from 50 million to 250 million user files. In addition to inaugural users in fields that were relatively new to supercomputing, such as radio astronomy, machine learning and large-scale genomics, the system has served users from fields completely new to supercomputing. These include analyzing Twitter data for early epidemic warning, untangling coincidental correlations from cause-effect relationships in cancer and even how 19th-century authors handled character gender roles. This wide set of perspectives has helped PSC staff improve and test the DXC and, in the process, helped to guide the design of PSC’s newest, user-friendly supercomputer, Bridges.
An important accomplishment of the DXC work has been to enhance the performance of SLASH2, the PSC-developed software that handles data for DXC. Improvements included optimizing SLASH2 for the GeneTorrent software toolset for using the Cancer Genomics Hub and creating a Big-Data-optimized alternative to rsync, a popular utility for transferring files. Another important component of DXC is to test and optimize the use of new, faster hardware for data storage, including solid-state disks and future SAS3-standard disk drives. PSC was the first customer to receive and test SAS3-capable hardware from Super Micro, Inc.
Future improvements to DXC may include innovations that allow multi-site users to employ their own authentication methods securely within the DXC system and methods to minimize the need for copying remote data to use it, reducing both data storage and networking demand.
WEB10G CODE SUBMITTED FOR LINUX INCLUSION
Web10G, software developed by PSC and National Center for Supercomputing Applications staff to diagnose network problems, may soon be finding its way to a computer near you. The National Science Foundation-funded project has developed a means for extracting information from the common TCP/IP network protocol, for the first time giving network administrators detailed data necessary to diagnose and fix a host of networking problems.
“The code for Web10G has been submitted to the Linux kernel team for review and inclusion,” says Chris Rapier, PSC network applications engineer. Once included in Linux—the operating system favored by many programmers and computer scientists—the software will be a step closer to inclusion in consumer operating systems such as MacOS and Windows. “This process might take some time, but I’ve been invited to speak at NetDev 0.1—the primary conference for Linux network developers. I’ll be leading an open-ended discussion on the value of instrumentation like Web10G.”
It may be a bit surprising to learn that even network engineers have never been able to look inside a TCP/IP connection to make fine-tuned assessments of what’s going wrong with it. This is an unintended consequence of TCP/IP’s original design. Web10G is a way to recover data on an individual connection so that network administrators and even individual users can tell why a network connection has failed or slowed.
Rapier presented Web10G and its applications for enhancing scientific workflow at several major symposia in 2014, including serving as an invited speaker at the Chinese American Networking Symposium at New York University and Internet2’s I2 Tech Exchange in Indianapolis. At the latter, he also introduced Insight, a new interface that allows users to visualize their network connections, easily identify poorly performing connections and to submit Web10G data to administrators to aid in diagnostics.
PSC EDUCATION: BUHL INTERIM REPORT IS OCCASION TO TAKE STOCK
PSC’s interim report to the Henry C. Frick Fund of the Buhl Foundation, submitted halfway through a three year grant, provided an opportunity to take stock of the center’s ongoing Innovative Approaches to STEM Education (IASE) program.
IASE is built on prior PSC programs in “computational reasoning,” the use of modeling and simulation tools as well as pedagogical models. The program is using computational reasoning to help high school science and math teachers introduce their students to computational tools and to use those tools to teach course content. PSC developed the Computation and Science for Teachers (CAST) program based on tools developed by the Maryland Virtual High School and, with funding from the DSF Charitable Foundation, created a Professional Development program for teachers (available online at https://www.psc.edu/index.php/resources-for-educators/cast).
The three-year 2012 Buhl grant continued CAST as IASE, combining its resources with those developed by PSC’s Better Educators of Science for Tomorrow (BEST) program, which helps high school teachers incorporate computational tools into their biology curricula, and PSC’s CMIST (Computational Modules in Science Teaching) program, which brings innovative science tutorials into secondary school classrooms, focusing on integrative computational biology, physiology and biophysics.
In the first phase of the program, PSC worked with teachers from the Pittsburgh School for the Creative and Performing Arts and the Pittsburgh Science and Technology Academy. A 2013 “Summer Institute” held at PSC brought teachers from these schools together to discuss and address STEM teaching challenges and introduced computational reasoning and the first modeling tool from CAST, interactive Microsoft Excel spreadsheets. Last year’s Summer Institute opened IASE to more teachers from other Pittsburgh-area schools. Topics continued the exploration of both computational reasoning by introducing agent-based modeling in NetLogo and other simulation resources available over the Internet. The 2015 Summer Institute will focus on the last of the CAST modeling tools, the Vensim software, an industry-standard software package for creating computer simulations (http://vensim.com). Another focus will be the dual challenge faced by teachers: a standardized-test-oriented curriculum that makes incorporating new computational tools difficult, along with new written standards requiring such tools.
Future work will concentrate on developing pilot programs for introducing computational tools into curricula, comparing pilot schools with those without such programs.