DXC

Data Exacell (DXC): Data Infrastructure Building Blocks

Hardware Test Bed Production Development Foundation Software SLASH2 Support For Big Data Analytics WOKFS SCAMPI Grace OpenStack Two-factor Authentication Production Resource Greenfield / Crucible Development Resource DXC / Hopper Data Analytics Projects Documentation Inventory, Configuration, Monitoring, Reporting

Building Block Descriptions

Mousing over the red building blocks in the graphic will display more information about them.

 

Deliverables

The following Data Infrastructure Building Blocks are deliverables of the DXC project.

 

Building BlockDescriptionStatus
SLASH2 Distributed file system In production
Weldable Overlay Knack File System (WOKFS) Tool kit to assist in the development of efficient layered FUSE filesystem In development; being used for other projects
SCAMPI file system Multi-site file system with enhanced access controls Phase 1 in production
Phase 2: next step
OpenStack Open source software for creating private and public clouds. This community developed software controls large pools of compute, storage and networking resources, includes cinder and ceph. In production
Documentation System administration documentation describing administrative operations involved in a SLASH2 deployment. Initial draft complete and in user review
Networking documentation outlining how to optimize SLASH2 operation between PSC and collaborating sites. This documentation has broad applicability extending beyond DXC and SLASH2. Initial draft complete and in user review
VM support for distributed applications

Efficient, flexible software infrastructure and procedures to create VM instances for databases and web servers

  • MySQL, PostgreSQL, eXistdb, CouchDB, MongoDB, Neo4j, etc.
  • Apache, Tomcat, PHP, etc.
  • Management of security and allocation of network ports
In development, transitioning to production
GRACE Service platform that allows users with disabled accounts a method for file transfers and deletions.  In development
Two-factor Authentication Ensures authorized users access to restricted data sets and secure access to the data.  
Hopper High-performance, shared, parallel SLASH2 file system instance spread across multiple locations nationwide to show the WAN capabilities of SLASH2.  The furthest of these locations, at the University of Wyoming over 1450 miles away, is also a DXC REU partner. Students participating in this REU are installing and configuring the SLASH2 file system at the University of Wyoming as well as working to test the usability of such a file system spread across the country. In development
Greenfield Enables users who need memory-limited scientific applications in fields as different as biology, chemistry, cosmology, machine learning and economics. In production
Crucible High-performance, shared, parallel SLASH2 file system instance that extends PSC developed technologies. In production

Domain-specific building blocks (examples)
GBT Mapping Pipeline Radio astronomy In development
Web-based client for causal discovery Causal discovery, initially for biomedical applications In development
Distributed scheduling of large-memory and otherwise resource-intensive applications from Galaxy and GenePattern Distributed, heterogeneous execution of genomic workflows In development
Deep software stacks to support nontraditional HPC areas of research

Examples:

  • Natural language processing
  • Image processing
  • Deep learning
  • Causal analysis
  • Real-time, lossless ingest of social network data
In development
The Pittsburgh Genome Resource Repository Genomics, pipelines for analyzing cancer genome data In development
GenePattern hosted server Bioinformatics, web-based interface for bioinformatics methods In production on Bridges
Galaxy Bioinformatics, flexible platform for creating and running reproducible scientific workflows In production on Bridges
Chameleon Developed and tested SLASH2 and OpenHPC deployment and scaling strategies using NSF Chameleon Cloud In production

DXC Tutorials