Pittsburgh Supercomputing Center Lands $7.6-Million NSF Grant

Four-Year Project to Prototype the Data Exacell, a Next-Generation System for Integrated Data Storage and Analytics

Monday, Oct. 21, 2013

The National Science Foundation (NSF) has approved a grant to Pittsburgh Supercomputing Center (PSC) to develop a prototype Data Exacell (DXC), a next-generation system for storing, handling and analyzing vast amounts of data. The $7.6-million, four-year grant will allow PSC to architect, build, test and refine DXC in collaboration with selected scientific research projects that face unique challenges in working with and analyzing “Big Data.”

Read more: PSC Lands $7.6-Million...

Mind the Gap

MARC Program Helps Minority-Serving Institutions Prepare Students for 21st-Century Biology Careers

August 13, 2013

American biology education risks becoming a two-class system. The top-tier institutions understand that bioinformatics—using advanced computing techniques on biological problemswill soon be a job requirement in much of biology, and have expended considerable resources to create bioinformatics classes, degree programs and research centers. Students at institutions without such resources or expertise, on the other hand, are in danger of being left behind.

Read more: MARC Program Helps...

CFL Software, PSC Collaborate on Next Generation of Information Searching

JULY 18, 2013

SherlockNew software being developed by CFL Software may transform our ability to search for information in text documents as profoundly as search engines improved upon paper library card catalogs. The software, CFL Discover, will search electronic text documents far more completely and accurately than possible with today’s search technologies.

Pittsburgh Supercomputing Center (PSC) is collaborating with CFL as a strategic partner in developing CFL Discover, making the software available to researchers on Sherlock, a modified version of YarcData’s UrikaTM, a real-time data discovery appliance at the center.

“This is a new venture both in terms of scale and speed in searching for information,” says David Woolls, CEO of CFL Software, which specializes in linguistic document forensics. “In essence, we take over where search engines stop.”

While many users may not be aware of it, search engines don’t completely search all the text in the entire Web — that would take far too long. Instead, they search indexes, keywords, categories and other “metadata” that have been added to those documents. In the case of keywords and categories, that addition has to be made by humans, and so is time-intensive and incomplete. Today’s engines obviously revolutionized our ability to find information, but they are inexact. Many irrelevant sites pop up, and many sites that may be more suitable aren’t captured. In a sense, we all stop when we reach a site that is “good enough” rather than one that’s best for our needs.

“Search engines start with a few words and return a list of documents which contain them,” Woolls adds. “CFL Discover starts with one or more of those documents and reads them for you, shows you the terminology that is shared and gives immediate access to the passages of particular interest to you.”

The program uses YarcData’s industry-standard SPARQL query language and RDF (Resource Description Framework) to search entire texts for meaningful connections between the words in a search query and related language in other texts. This kind of “graph search” enables someone searching for information to find relevant connections that they may not have thought of. The program is written in Java, so is platform independent and can work on anything from a standard PC to a Java-capable supercomputer. (While most supercomputers can’t run Java, two at PSC — Sherlock and Blacklight — do, providing valuable support for research communities that primarily use Java for data analytics.) The choice of platform and computer is solely dependent on the volume and speed of response required.

“It’s less like searching for a needle in a haystack than searching for a needle in a needlestack,” says Arvind Parthasarathi, President, YarcData. The advantage of CFL Discover is that it allows related groups of documents to be rapidly identified, not on the basis of pre-determined keywords and categories, but purely on the similarity of the content. This in turn allows the rapid creation of new combined databases from a collection of existing databases. For example, when searching Wikipedia, entering the title of an article causes CFL Discover to read the database, returning a comprehensive list of potentially interesting articles related to the whole content. And because the framework is RDF, searches of other RDF collections can be readily performed. The principles on which the program works allow it to be used in many different languages, including Arabic, Chinese, Thai and Finnish, which appear to be very disparate to the human eye.

“The structures and sequences inherent to individual documents are all that are needed to encode them,” Parthasarathi says. “New material is easily added to existing stores and is immediately available for use by the search queries.”

CFL Software has carried out proof-of-concept studies of CFL Discover to search U.S. Patent Office record and legal document description sections as well as Wikipedia. The collaboration with PSC will employ the program on PSC’s Sherlock, which is optimized to search extremely large and complex bodies of information with open-ended queries. The new work will explore a substantial portion of the U.S. Patent database, in addition to the full data of Wikipedia in more depth.

“PSC’s role in the partnership is to couple the unique analytic capability of Sherlock running CFL Discover with hosting massive datasets on PSC’s Data Supercell to expand text analytics to unprecedented, interdisciplinary use cases,” says Nick Nystrom, PSC’s director of strategic applications. “Response time is critical for exploring big data, and Sherlock with CFL Discover will provide rapid analyses of unstructured text data larger than can be done on any platform currently available to U.S. researchers.”

“We see high value for a wide range of research and societal applications,” Nystrom adds. Examples include analyzing recent events from news and social media sources, extracting deeper insights from sets of publications, and enabling computational history and culturomics — the quantitative study of cultural phenomena by analyzing large volumes of written records. “Application of high-performance analytics is new to these and similar fields, and will catalyze new ways of leveraging unstructured text data.”

Blacklight Research Spurs Change in Stock Exchange Rules

July 15, 2013

Findings on the effects of “odd lot” trades on the financial markets, using computations on PSC’s Blacklight, have led the New York Stock Exchange, the Nasdaq Stock Market and the Financial Industry Regulatory Authority Inc. to redefine how the industry tracks small stock trades. The new rules will be enacted in October.

Previously, odd lots — trades of 100 or fewer shares — did not have to be reported to regulators. The rationale was that these trades involved small investors who were unlikely to affect the larger market significantly. But recent volatility in the markets, driven by automated small trades that occur far faster than any human can think, called that assumption into question.

In an upcoming paper in The Journal of Finance, Mao Ye, University of Illinois, Urbana-Champaign, Chen Yao of UIUC and Maureen O’Hara, Cornell University, report that odd lots are playing an increasingly important role in the wider behavior of the markets. The researchers used Blacklight and the San Diego Supercomputer Center’s Gordon to analyze market data for the effects of odd-lot trading.

“For every 100 trades of Google, 52 to 53 of them” are in the form of odd lots, Ye observes. “There are more missing trades than trades you can see. In terms of volume, more than 20 percent of the trading volume [among all stocks] is missing” in the official count.

The widely held suspicion is that the largest and most sophisticated traders are using automated trading in odd lots to hide their activities from other traders. In any case, the researchers showed that including the odd lots significantly alters our understanding of the markets. Partly in response to this research, in June 2013 the market authorities agreed to a plan to require all trades, of as few as one share, to be reported.

“In the U.S., they care a lot about the transparency of the market,” Ye explains. The new rule change will remove “a kind of darkness we cannot see and that we never realized was there.”

PSC covered the group’s work in detail in a recent article that you can find here.

2014 Pennsylvania State Budget Includes $500,000 for Pittsburgh Supercomputing Center

July 2, 2013

The Commonwealth of Pennsylvania budget signed by Gov. Tom Corbett on June 30 includes a $500,000 line item for PSC.

“This is very good news for PSC and for the Commonwealth,” says Ralph Roskies, scientific director for PSC, adding that the state’s return on its past investments in PSC has been excellent. “Since our inception we’ve brought over $500 million in outside funds into Pennsylvania, representing a 14:1 return on state funding for PSC.”

“We’re grateful to the members of the General Assembly, and especially the Allegheny County delegation,” Roskies adds. “The bipartisan support of Senators Randy Vulakovich and Jay Costa and Representatives Mark Mustio and Joe Markosek made this possible.”

The funding, says PSC’s leadership, will benefit the state’s technological and workforce infrastructures as well.

“PSC is responsible for generating 1,600 jobs and over $200 million in annual economic activity,” says Cheryl Begandy, PSC’s director of education, outreach, and training. “In addition, our place on the leading edge of computing technologies at the largest scale enables us to respond quickly to technological developments, giving the state, its researchers and its small and mid-sized companies a leg up in capitalizing on these advances.”

The state line item will also prove valuable to PSC’s ongoing competition for federal research funding. Local funding is often seen by granting agencies as concrete evidence of grassroots support for a research center.

“In our fight for federal awards, we’re competing with some of the best high performance computing centers in the world, many of which enjoy significant state funding,” Roskies says. “The state line item will help us retain a competitive edge over and above the excellence of our proposals themselves.”

The details of the line item have yet to be worked out with the state, Roskies says. Potential projects include

  • supporting the Commonwealth’s STEM Education initiative through PSC programs in Computational Reasoning and Bioinformatics
  • collaborating with the Pennsylvania State System of Higher Education to support research and education at its 14 state universities
  • supporting small and mid-sized manufacturers in Pennsylvania through the introduction of Digital Modeling tools, resources and training
  • encouraging workforce development through internships for undergraduate or graduate students
  • continuing PSC core management and outreach efforts expected by federal and other granting agencies

Pittsburgh Supercomputing Center, Numascale AS to Collaborate on Improved Memory Systems for Research

June 28, 2013

Pittsburgh Supercomputing Center (PSC), Pennsylvania’s only National Science Foundation high performance computing facility, and Numascale AS, whose products support the construction of low-cost, scalable-server computer systems, have launched a collaborative project investigating the applicability of Numascale systems to the many research projects requiring more directly addressable memory than is readily available on single, commodity, multi-socket, large memory servers.

“Rapid advancement in many scientific fields of data-dependent research will be facilitated by the availability of larger memory systems at near commodity prices,” says Michael J. Levine, scientific director, PSC. “Having large amounts of data in directly-addressable memory avoids very time-consuming disk input/output and allows a much more productive programming paradigm.”

The field of supercomputing is well known for engineering extreme processing speeds but increasingly, researchers’ calculations are limited not by the speed of processing but access to and efficient use of vast amounts of data. Application areas that require very large memories include natural language processing, multi-organism genomics and quantum chemistry.

“We see the collaboration with Pittsburgh Supercomputing Center as an important milestone for utilizing NumaConnectTM for a number of applications that have previously been limited by inferior memory capacity in standard servers,” says Einar Rustad, CTO and co-founder of Numascale. “The huge and scalable memory capacity in systems with NumaConnect allows users to operate in the familiar programming and runtime environment they are used to with workstations.”

This, Rustad explains, eliminates the need for explicit message passing and significantly reduces the overall time from idea to solution for a number of important applications in many scientific fields. “PSC's unique expertise will strengthen our focus on applications that are key to advances in major scientific fields and help us to widen the market for Numascale.”

The collaboration between PSC and Numascale seeks to leverage PSC’s unique and extensive experience with very large memory computing systems and Numascale’s NumaConnect memory technology to produce systems capable of handling such large data volumes without memory-retrieval lags. NumaConnect uses commodity servers as building blocks to provide memory capacities and retrieval speeds currently only available through high-end and enterprise-class systems. PSC’s application specialists will work with Numascale engineers and application programmers to find ways the two organizations’ experience and expertise can be combined synergistically.

Opening the Flood Gates

Argonne, PSC Staff Shepherd Internet2 Migration, Give XSEDE Network Bandwidth Needed for Big Data Era

Monday, June 24, 2013

Thanks to personnel at Argonne National Laboratory and PSC — chiefly Linda Winkler, senior network engineer, Argonne; Joseph Lappa, principal network design engineer, PSC and Kathy Benninger, network performance engineer, PSC — the National Science Foundation’s network of supercomputing sites now has the “pipe capacity” it will need to keep pace with the Big Data era.

XSEDE, the National Science Foundation’s U.S.-wide network of high performance computing centers, which includes Argonne and PSC, has migrated its data network to Internet2, a vastly higher-capacity system than the previous carrier. XSEDE’s improved network will enable sites to achieve connection rates of up to 100 Gigabit per second (100 GE) — 10 times faster than currently possible. The architecture of the new system will also enable a number of upgrades that will help the transfer of data through the system.The XSEDE Data Network

As part of the Internet2 migration, Lappa has taken on new responsibilities for the XSEDE network. Newly appointed as XSEDE’s operations networking manager, he will be XSEDE’s main contact with Internet2. In this role, he and his team will monitor the performance of the new network, oversee details of transitioning sites to 100 GE, assist with campus bridging and help Internet2’s programmers and service representatives optimize and tailor the network to XSEDE and its users’ needs.

The approaching bottleneck

In 2006, Senator Ted Stevens made the mistake of referring to the Internet as “a series of tubes.” He instantly became the brunt of jokes about a guy who grew up in a time when people communicated via post, in cursive script, trying to make sense of an email world. But to be fair, it isn’t such a bad metaphor.

Information — data — is as critical to our economy and society as fresh drinking water is to our homes. Like the plumbing running through our houses, the Internet transports data through “pipes” that are limited both by their size and by the capacity their “faucets” can deliver.

Users at XSEDE sites employ some of the largest, fastest computers in the world to generate vast volumes of data. Moving those data between researchers, the supercomputers and storage sites is no small mission. To accomplish that job, XSEDE originally built what was then one of the highest-capacity, most reliable networks in the world.

“Advanced networking is critical … to support the researchers and educators who are making innovative use of our … resources,” says John Towns, XSEDE project director, noting that XSEDE supplies about 8,000 users with 17 supercomputers, data storage and management tools and networking resources.

In the Information Age, though, technology ages quickly. As the XSEDE network and its demands grew, it began to approach the limits of its infrastructure: in particular, a potential bottleneck between XSEDE sites in Denver and Chicago loomed large.

“As far as the technical reasons for migrating to Internet2, it was the ‘speeds and feeds’ problem,” Lappa says. A factory, for example, can perform an operation on a product quickly (speed). But if it can’t then move the next product up the line (feed) fast enough, that speed is wasted. Similarly, the blinding speed of XSEDE’s computing machines was in danger of being made far less relevant by the approaching difficulty of getting data into and out of them.

Unclogging the pipes

Internet2’s 100 GE backbone proved to be the solution to the problem, Benninger says. “With 100 GE, there is a clearer path to allow us to operate.”

While not all the sites will initially have 100 GE connections to the new backbone, she adds, the system will have room to grow to meet the next three years’ needs. Currently, Indiana University and Purdue University share a 100 GE connection, with a number of other sites planning to upgrade over the next several years.

In addition to supplying the leadership for the migration process, PSC also served as one of the first sites on the new network, testing out and helping Internet2 improve and customize the system to serve XSEDE’s needs.

Internet2’s architecture offers a big plus in terms of managing data flow with what’s known as “dynamic provisioning capability.” If a particular network path between two sites is congested with large data flows, a network engineer can establish a virtual local area network (VLAN) to route additional data transfers over an alternate path.

Future upgrades

In addition to optimizing the network and helping sites connect with the backbone or upgrade to 100 GE, Benninger and Lappa will support efforts by a number of PSC and XSEDE staff to add new functions that take advantage of the higher bandwidth.

  • The XSEDE-wide File System (XWFS) will allow the increasingly large files required by researchers to be moved rapidly between XSEDE sites.
  • Web 10G, developed by Chris Rapier, PSC network programmer, Andrew K. Adams, PSC network engineer and John Estabrook, network programmer at the National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, will monitor data flowing from servers to the network to help pinpoint sources of slowdown even as they happen.
  • VLAN (virtual local area network) provisioning will allow any two XSEDE sites to set up a “virtual network” between the two sites that performs as if it were a direct, hard-wire data connection, avoiding the need to set up potentially complex routing through the network.

Lack of Reliable Transportation Undermines Delivery of Lifesaving Vaccines

University of Pittsburgh / Pittsburgh Supercomputing Center Computer Simulations Highlight Need to Increase Focus on Vaccine Transport

Tuesday, May 28, 2013

Transportation of vaccines is a critical component for improving vaccination rates in low-income countries and warrants more attention, according to a computer simulation by the HERMES Logistics Modeling Team at the University of Pittsburgh and Pittsburgh Supercomputing Center (PSC). The team recently reported their findings in the PLOS ONE online journal (http://dx.plos.org/10.1371/journal.pone.0064303).

Each year, millions of dollars of potentially lifesaving vaccines fail to reach populations throughout the world. Most aid programs tend to focus more on purchasing vaccines or donating refrigerators and freezers to help ensure vaccine delivery. The computer simulation of the West African nation of Niger showed that improving transportation as well could improve vaccine availability among children and mothers from roughly 50 percent to more than 90 percent.

Read more: Pitt / PSC Computer...

PSC, Notre Dame to Supply Computer Infrastructure for Global Malaria Eradication Project

Monday, April 29, 2013

Pittsburgh Supercomputing Center (PSC) and the University of Notre Dame have received up to $1.6 million in funding from the Bill & Melinda Gates Foundation to develop a system of computers and software for the Vector Ecology and Control Network (VECNet), an international consortium to eradicate malaria. The new VECNet Cyber-Infrastructure Project (CI) will support VECNet’s effort to unite research, industrial and public policy efforts to attack one of the worst diseases in the developing world in more effective, economical ways.

“VECNet is about bringing order out of chaos,” says Tom Burkot, VECNet’s principal investigator and professor and tropical leader at James Cook University, Australia. “The challenge we have is that we’re trying to control and eliminate malaria in a world in which, for example, there are 40 or 50 dominant mosquito species that are important for its spread.” The CI project, he adds, is intended to decrease the complexity of engaging in the problem so that malaria researchers, national malaria control officials, product developers, and policy makers can all contribute to solutions.

Read more: PSC, Notre Dame to Supply...

PSC Media Contacts

Media / Press Contact(s):

Kenneth Chiacchia
Pittsburgh Supercomputing Center

Vivian Benton
Pittsburgh Supercomputing Center

Website Contact

Shandra Williams
Pittsburgh Supercomputing Center

Use of PSC materials: To request permission to use PSC materials, please complete this form.

Events Calendar

<<  September 2017  >>
 Su  Mo  Tu  We  Th  Fr  Sa 
       1  2
  3  4  5  6  7  8  9