Data Collections

A community dataset space allows Bridges' users from different grants to share data in a common space.  Bridges hosts both public and private datasets, providing rapid access for individuals, collaborations and communities with appropriate protections.

Community datasets are appropriate when data will be shared amongst Bridges' groups.  Any data that should only be accessed by one group should be stored in that group's pylon5 space.

If you have a dataset for use by multiple groups on Bridges, request that it be stored in the community dataset space by completing the Community Dataset Request form. If your data collection has security or compliance requirements, you should indicate so on the form, or you can contact compliance@psc.edu.

Request a Community Dataset

 

Public datasets

Some data collections are available to anyone with a Bridges' account.  They include:

 

ImageNet

ImageNet is an image dataset organized according to WordNet hierarchy.  See the ImageNet website for complete information.

Available on Bridges at /pylon5/datasets/community/imagenet

 

Natural Languge Tool Kit Data

NLTK comes with many corpora, toy grammars, trained models, etc. A complete list of the available data is posted at: http://nltk.org/nltk_data/

Available on Bridges at /pylon5/datasets/community/nltk

 

MNIST

Dataset of handwritten digits used to train image processing systems.  

Available on Bridges at /pylon5/datasets/community/mnist

 

Genomics Data

Several genomics datasets are publicly available. 

BLAST
The BLAST databases can be accessed through the environment variable $BLASTDB after loading the BLAST module.
CAMI
CAMI (Critical Assessment of Metagenome Interpretation) is a community-led initiative designed to help tackle challenges in metagenome assembly and analysis by aiming for an independent, comprehensive and bias-free evaluation of methods. Data from the first CAMI challenge is available at /pylon5/datasets/community/genomics/cami.
RepBase
Repbase is the most commonly used database of repetitive DNA elements. You must register with RepBase at http://www.girinst.org and send proof of registration to genomics@psc.edu in order to use the Repbase database.
UCSC
The University of California at Santa Cruz reference genomes are available at /pylon5/datasets/community/genomics/UCSC.  The collection includes human, mouse and drosophila genomes.
Other genomics datasets
Other available datasets are typically used with a particular genomics package.  These include: 
Barrnap /pylon5/datasets/community/genomics/barrnap 
CheckM /pylon5/datasets/community/genomics/checkm
Dammit
Dammit uniref90 
/pylon5/datasets/community/genomics/dammit
/pylon5/datasets/community/genomics/dammit_uniref90
Homer /pylon5/datasets/community/genomics/homer
Kraken /pylon5/datasets/community/genomics/kraken
Long Ranger /pylon5/datasets/community/genomics/longranger
MetaPhlAn2 /pylon5/datasets/community/genomics/metaphlan2
Phylosift /pylon5/datasets/community/genomics/phylosift
Prokka /pylon5/datasets/community/genomics/prokka

System Status

  • Bridges is Up

     

      Bridges is running normally.

New on Bridges

Bridges-AI Early User Program Guide available

The Bridges-AI early user program is now underway and providing access to 88 Tesla "Volta" GPUs.
Read more

The default version of Singularity is now 3.0.0.
Read more

Omni-Path User Group

The Intel Omni-Path Architecture User Group is open to all interested users of Intel's Omni-Path technology.

More information on OPUG