TeraGrid/Blue Waters Symposium on Data-Intensive Analysis, Analytics, and Informatics
April 14 - 15, 2011 Pittsburgh, PA
Hosted by the Pittsburgh Supercomputing Center
The TeraGrid/Blue Waters Symposium on Data-Intensive Analysis, Analytics, and Informatics brought more than 60 scientists together to explore methods and systems for handling increasingly large data sets. To see some of their ideas, go to the symposium proceedings.
Discovery — in science, the proliferation of "omics" (genomics, metabolomics, economics, etc.), machine learning, sociology, the humanities, and even in fields such as archaeology that are not commonly thought to be heavily computational — increasingly depends on our ability to apply quantitative techniques to vast collections of data ranging from terabytes to petabytes (and beyond, as our capacity to handle data improves). Data may come from instruments, experiments, observations, surveys, and simulations. Data types encompass numbers, text (structured and unstructured), networks (graphs), images, video, audio, geometry, time series, etc. Fusion of data from disparate sources is often needed to discover correlations, and such fusion may result in additional data products. The rate at which new data arrives varies widely. Data may be semi-static, such as census information, which is only updated every ten years. New data may arrive from time to time, such as seismic measurements from a new earthquake, adding to the body of data already accumulated. Perhaps most challenging, data may be streamed, as in the case of data coming from real-time instruments such as telescopes or network intrusion monitors, or from running simulations. These diverse modalities require sophisticated approaches to analyzing and handling large data effectively.
The TeraGrid/Blue Waters Symposium on Data-Intensive Analysis, Analytics, and Informatics will bring together leaders in the development of algorithms, applications, frameworks, and libraries for addressing data-intensive problems at unprecedented scale. Examples of topics that this symposium will explore include new kinds of applications that focus on analytics and informatics, integration of machine learning algorithms with analysis of multimodal data, ways to express workflows that couple tasks in data-intensive analysis, and component models for productively building data-intensive applications. The symposium targets approaches and technologies that are being developed now, having near-term applicability and impact for researchers to make more effective use of Blue Waters, NSF "Track 2" systems, and large systems in NSF and other agencies. The format will integrate presentations and group discussion to maximize interaction, identification of common issues, and synthesis of new ideas.
Nick Nystrom (PSC), Sergiu Sanielevici (PSC), Daniel S. Katz (University of Chicago), Scott Lathrop (University of Chicago and NCSA), Amit Majumdar (SDSC), Dan Stanzione (TACC), and Bob Wilhelmson (NCSA)