Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing,
communications and data analytics.

Fastx

 

The Fastx Toolkit is a collection of command-line tools for Short-Reads FASTA/FASTQ files preprocessing.

 

Installed on blacklight

 

Other resources:

Website: http://hannonlab.cshl.edu/fastx_toolkit/

 

Running Fastx

 

1. Create a batch job which

     1. Sets up the use of the module command in a batch job

     2. Loads the fastx module

         module load fastx

     3. Includes the other commands to run fastx

2. Submit the batch job with the qsub command

Introduction to Unix

Unix was first developed in 1969 at Bell Labs, a division of AT&T at the time. Many of today's operating systems, including Linux, are derived from Unix.

Files on a Unix system are kept in "directories" instead of "folders", but the file system structure is the same. Directories are organized in a tree structure, just as folders are on a personal computer. Every directory has one parent directory and can have multiple subdirectories. You can organize your files by creating directories and storing files in them appropriately.

You can read more about how to create and navigate Unix directory structures in the section "Unix directory structures".

Unix commands have very short names.  The command names were chosen to minimize the amount of typing needed, and were meant to be mnemonic.  They are also case sensitive. So for  example, the command to copy a file is "cp" for copy.  Typing "CP" or "Cp" will give you a "command not found" error.

Common Unix commands

This table shows many of the common tasks you might want to do on a computer, and the Unix command for them. The phrase that each command is derived from is given below the command as a mnemonic device.

When you want to ...... type this Unix command
Find out what directory you are currently in pwd
print working directory
See what files are in your current directory ls
list
Copy a file
the original file is unchanged
cp source-file target-file
copy
Delete a file rm filename
remove
Warning: The file is deleted immediately. You will not be prompted to confirm the deletion.
Create a new directory mkdir directory-name
make directory
Move a file to a different directory
file is removed from the original directory
mv source-file path-to-new-folder/target-file
move
Delete a directory
(The directory must be empty)
rmdir directory-name
remove directory
Move to a different directory cd path-to-directory
change directory
Get help on a command man command
manual

Unix directory structures

The top of the directory tree for the entire computing system is called the root and is represented on Unix systems by the '/' symbol. 

The root of your personal directory tree, where you can store files, is also called your "home directory".   Unix provides aliases for your home directory so that it's easy to move around and get back to it.  Your home directory can be referred to as "$HOME" or using a tilde (~) and your username. For example, joeuser's home directory is ~joeuser

To navigate through your directories, you need to know the path to the directory  you want to move to.  Unix uses a "/" character to separate directory names when you are describing a directory path.  Suppose that you have two subdirectories in your home directory, and they are project1 and project2.   The path to these directories would be $HOME/project1 (or you could use ~username/project1) and $HOME/project2, respectively. 

Suppose further that under project1 are subdirectories input-data and results.  Those directories are $HOME/project1/input-data and $HOME/project1/results.

Navigating through Unix directories

To move through a Unix directory tree, use the  "cd" command.  For example, if you are in your home directory and you want to move to the project1 subdirectory, you would type

cd project1

To get to the input-data directory from $HOME, you would type

cd project1/input-data

This path, project1/input-data, is called a relative path because it is relative to wherever you are at the moment. 

You can also use an absolute path to move around the directory structure.  Absolute paths contain the entire directory path from the root directory of the computing system.  For example, installed software packages on a PSC system are often stored in subdirectories of the /usr/local/packages  directory.  For example, to see what packages may be available to use, type

ls /usr/local/packages

Getting help

 The Unix "man" command can give you information about most other commands and software. For example, to see a description and all the options for the "ls" command,type

man ls

If you don't know the name of the command you need, but you know the subject matter, you can use the "-k" option. Typing "man -k topic" gives a list of all the commands with topic in their names or descriptions. For example, to find information about the fortran compilers, you could type

man -k fortran

Unix tutorial

There are many Unix tutorials online. One that we like is from Open-Of-Course, a site for free and open content and tutorials. See the Unix tutorial here.

Local, Expert-Supported Access to the World’s Most Advanced Computing Infrastructure.

Pittsburgh Supercomputing Center (PSC), a joint effort of Carnegie Mellon University and the University of Pittsburgh since 1986, provides faculty and students in the Pittsburgh area with access to the most advanced, powerful, and robust collection of integrated advanced digital resources and services in the world.  PSC’s technical staff has years of experience in applications and systems software design and implementation, quantitative analysis, advanced consulting, and delivering high-quality training. They are available to discuss faculty members' needs and to guide them to the best solution drawing upon the resources of PSC and of the Extreme Science and Engineering Discovery Environment (XSEDE, www.xsede.org) of which PSC is a leading partner.

State-of-the-Art Computing By PSC

PSC architects and operates a sophisticated facility that includes high-performance computing (HPC) systems, high-speed parallel filesystems, and leading-edge networking.

Computing: PSC’s flagship HPC system, Blacklight, is the world’s largest shared-memory system. Its familiar Linux operating system and versatile programming models make Blacklight as easy to use as a PC with up to 16TB of RAM.  Yet Blacklight also enables massively parallel tasks using up to 4096 cores and 32 TB of memory.

Active Storage: Serving PSC’s high-end computing systems are an integrated high-performance Lustre parallel filesystem and an innovative, ultrafast archiver. Together, they offer high-capacity (multi-petabyte), robust, secure, low-latency, high-bandwidth access to data. Integration of short-term storage with an ultrafast archiver enables unique approaches to analyzing large-scale data that grows over time.

Networking:  Network facilities at PSC consist of production and research LAN, MAN, and WAN infrastructures. PSC’s WAN connections are provided by 3ROX, a regional network aggregation point operated and managed by PSC to provide cost effective, high capacity, state-of-the-art network connectivity to the university community.

Applications: PSC hosts advanced, scalable applications and software infrastructure to support engineering, science,and analytics. Applications are supported by PSC’s domain experts, who add value through integration, optimization, consulting and training.

Training Facilities:  PSC operates a state-of-the-art facility for hands-on training. It provides 30 dual-boot Linux/Windows workstations with Gigabit Ethernet connectivity and world-wide videoconferencing support. A connected lecture hall provides space for up to 100 participants.

 

Integrated Advanced Digital Services By XSEDE

Building on its experience and reputation as a national supercomputing center since 1986, PSC is a leading partner in the Extreme Science and Engineering Discovery Environment (XSEDE, www.xsede.org), the most advanced, powerful, and robust collection of integrated advanced digital resources and services in the world. In addition to the resources and services operated by PSC itself, XSEDE offers local faculty and students an integrated portfolio of supercomputers and high-end visualization and data analysis resources and expertise across the country. PSC staff experts lead many of the nationwide XSEDE teams and can guide local faculty in understanding how to benefit from XSEDE resources and services. These include:

  • Peer-reviewed, no-cost allocation of XSEDE resources, including PSC’s.
  • The XSEDE User Portal, a web interface that allows users to monitor and access XSEDE resources, manage jobs on those resources, report issues, and analyze and visualize results.
  • Advanced cybersecurity to ensure that XSEDE resources and services are easily accessible to users but protected against attack.
  • An advanced hardware and software architecture rooted in user requirements and hardened by systems engineering that allows for individualized user experiences, consistent and enduring software interfaces, improved data management, and ways for campus resources to be transparently integrated into the overall XSEDE infrastructure.
  • A powerful and extensible network in which each XSEDE service provider is connected to a Chicago-based hub at 10 gigabits per second and has a second 10 gigabit per second connection to another national research and education network.
  • Extended Collaborative Support by staff experts for application development by individual research groups or wider communities.
  • Training, Education, and Outreach programs that expand participation in XSEDE-based projects, curriculum development, and training opportunities.
  • The Campus Champions program, which enables faculty and campus IT staff to work closely with XSEDE staff to support the XSEDE user community on their campus.
  • The Technology Investigation Service, which enables community members to request or recommend new technologies for inclusion in the XSEDE infrastructure and enables the XSEDE team to evaluate those technologies and incorporate them as appropriate.

For information, contact Dr. Sergiu Sanielevici at 412-268-5240 or This email address is being protected from spambots. You need JavaScript enabled to view it.

Resources for Users

PSC provides an integrated array of high performance computing and communications products and related services to our users, including supercomputing-class hardware, software, mass storage facilities, consulting, visualization services, and training. Information on all of these resources is available on the PSC web pages or by contacting  PSC User Services.

Allocations
How to apply for, administer, and renew allocations for computing services.
Computing Resources
PSC operates several supercomputing-class machines.
Software
Popular software for many scientific disciplines are installed, including packages for computational chemistry, engineering, biomedical databases and sequence analysis, neural sciences and materials science.
Storage and Archival Resources
Virtually unlimited data storage is available on the Data Supercell, PSC's disk-based data management solution.

In addition to maintaining data produced at PSC, we can provide storage for externally generated data. Both daily operations and archival applications can be accommodated. Contact This email address is being protected from spambots. You need JavaScript enabled to view it. for more information.

Consulting Services
Our experienced team of user consultants stands ready to help you with technical problems.
Training
PSC offers a variety of workshops, both at PSC and off-site, on subjects ranging from code optimization and parallel programming to specific scientific topics and the Biomedical Applications group conducts workshops on biomedical computing topics. In addition to workshops, PSC also hosts symposia on high performance computing topics. 
Resources for Educators
PSC's Resources for Educators programs promote the understanding of supercomputing and its application in today's leading-edge scientific research. Training, technical expertise and access to high-performance computing facilities are available to audiences as diverse as K-12 students and teachers, university-level scientists and corporate research communities.

 

Resources for Faculty
Local, Expert-Supported Access to the World’s Most Advanced Computing Infrastructure
PSC provides faculty and students in the Pittsburgh area with access to the most advanced, powerful, and robust collection of integrated advanced digital resources and services in the world.  PSC’s technical staff has years of experience in applications and systems software design and implementation, quantitative analysis, advanced consulting, and delivering high-quality training. PSC's staff is available to discuss faculty members' needs and to guide them to the best solution, drawing upon the resources of PSC and of the Extreme Science and Engineering Discovery Environment (www.xsede.org) of which PSC is a leading partner.
Corporate Affiliates Program
The PSC Corporate Affiliates Program provides its industrial partners the expertise and resources to enhance and support their corporate technical computing capabilities and to exploit high performance computing technologies. With over twenty years experience in both operating an integrated high performance computing facility and in developing applications of this technology to solve critical research, engineering, and business problems, PSC can put this expertise as well as the high performance computing and communications facilities themselves to work for you. With the wide array of products and services offered, our programs are customized to meet the particular needs of our partners.

SSH Public-Private Key Pairs

You can authenticate to PSC systems using a public-private key pair to encrypt and decrypt an authentication message. The private key is available only to the user, while the public key is, well, publicly accessible. Data encoded by one key can only be decoded by the other. Knowledge of the public key does not allow one to deduce the private key.

If a set of public/private keys exist, the remote machine encodes a message using the public key when SSH makes a connection to it. It sends the encoded message back to the client machine. SSH decrypts the message using the private key. The private key is not disclosed during this transaction. SSH then sends the decoded message back to the remote machine; if it matches, the user is authenticated, and can log in without using a password.

You must create your own set of public/private keys with your SSH client. One common way to generate keys is the ssh-keygen command. Once the keys are generated, the public key needs to be propagated to the PSC systems you wish to access.

PSC has created a web interface so you can manage your key pairs, including propagating them to PSC machines. See how to use this interface to install and use SSH key pairs at PSC.

Or, you can go directly to the PSC SSH key management system.