Sherlock: Unlocking the Secrets of Big Data
Protein-protein interactions in yeast, forming a relatively small graph of only 7,182 edges, illustrate the complexity of problems in graph analytics. (See Vladimir Batagelj & Andrej Mrvar (2006): Pajek datasets)
Computational analysis that discovers underlying patterns in “big data” can open many doors to understanding, such as how genes work, the dynamics of social networks, and the source of breaches in computer security. With this kind of analysis, based on a mathematical approach called “graph theory,” interconnected webs of information can be represented as graphs, wherein nodes represent data elements and edges represent relationships among them.
Such graphs produced from real-world data can be huge, containing billions or trillions of edges. Even more challenging, these graphs typically can’t be partitioned; their high connectivity prevents dividing them into subgraphs that can be practically mapped onto distributed-memory computers. “Graph analytics are notoriously difficult,” says Nick Nystrom, PSC’s director of strategic applications, “because following unpredictable paths from node-to-node is rate-limited by latencies to remote and local memory, which has drastically limited the graph problems that can be tackled.”
To break the barrier blocking large-scale graph analytics, PSC this year introduced Sherlock, a unique supercomputer specialized for complex analytics on big data, which will be used for pilot projects by the national research community.