Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing,
communications and data analytics.

Circos

 

Circos is a software package for visualizing data and information.

 

Installed on blacklight.

 

Website: http://circos.ca/

 

Running circos

 

1. Create a batch job which

    1. Sets up the use of the module command in a batch job

    2. Loads the circos module

         module load circos

    3.  Includes other commands to run circos

2. Submit a batch job with the qsub command

 

Sample Batch Job

#!/bin/bash
#PBS -l ncpus=16
#PBS -l walltime=00:10:00
#PBS -j oe
#PBS -q debug
#PBS -N circos
source /usr/share/modules/init/bash
module load circos
set -x
mkdir -p $SCRATCH/circos
cd $SCRATCH/circos
cp -r $CIRCOS_TUTORIALS/* $SCRATCH/circos
cd tutorials/2/2
circos -conf circos.conf
ls

Trinity Usage on Blacklight

The information in this page was taken directly from the Trinity-use-on-Blacklight wiki page formerly hosted on Wikispaces. It was created by Brian Couger of Oklahoma State University, and we thank him for his time, his expertise, and his permission to use it.

Trinity Background

Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:

  • Inchworm assembles the RNA-Seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts.
  • Chrysalis clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptional complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs.
  • Butterfly then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that correspond to paralogous genes.

Blacklight Background

The Blacklight resource is hosted by the Pittsburgh Supercomputing Center (www.psc.edu).

Blacklight  is an SGI Altix UV 1000 supercomputer designed for memory-limited scientific applications in fields as different as biology, chemistry, cosmology, machine learning and ecnomics. Funded by the National Science Foundation (NSF), Blacklight carries out this mission with partitions with as much as 16 Terabytes of coherent shared memory.

Blacklight's unique architecture allows computational jobs that require a large amount of memory overheard, such as de novo transcriptome/genomic assemblies to be completed. The very large amount of addressable RAM allows for very high read density assemblies, many of which would be outside the computational scope of many other HPC systems.

A complete description of Blacklight can be found at:
http://www.psc.edu/index.php/resources-for-users/computing-resources/blacklight

Obtaining an account

Blacklight is part of the XSEDE program (https://www.xsede.org/), the successor to the TeraGrid. XSEDE is an National Science Foundation funded collection of HPC resources, services and expertise that allows users to use national HPC infrastructure resources remotely. Instructions for obtaining a user account can be found here: https://www.xsede.org/web/guest/allocations. Requirements are that you or a member of your group are a current researcher in the United States of America or have a research partner who is currently working in the United States.

Logging on to Blacklight

There are three options for logging on to Blacklight once you have established a XSEDE user account.

1. GSI-SSHTerm (All Systems): This allows you to access and use all XSEDE resources as well as transfer files to the desired resource
http://sourceforge.net/projects/gsi-sshterm/files/gsi-sshterm/0.91h/gsi-sshterm-0.91h.tar.gz/download?_test=goal

2. Putty/WinSCP (Windows) SSH (Linux): Allows usage/file transfer remotely through two separate programs
Putty: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html allows remote log ons
WinSCP: http://winscp.net/eng/download.php allows file transfer

3. XSEDE website (Web Browser): Allows usage/file transfer remotely through a web browser
https://www.xsede.org/user-portal


The host name to use is: blacklight.psc.teragrid.org
Upon log-in you will be prompted to enter a user name and password. Use the XSEDE username and password given to you when you received your XSEDE account.

login.PNG

 

Running Jobs on Blacklight

A highly detailed explanation of executing computational jobs on Blacklight and advanced usage can be found here: http://www.psc.edu/index.php/computing-resources/blacklight
This section gives a brief overview of the basics for usage of Blacklight and a step by step how-to guide on using Trinity on Blacklight.

Blacklight OS Structure

Blacklight uses a custom Linux based kernel structure for the OS and a PBS-Torque like system for scheduling and managing jobs. Users who have experience with either should be in familiar territory.
Helpful for Linux related questions: http://www.linuxquestions.org/questions/

Blacklight Queue Structure

There are 2 basic queues on Blacklight, the debug queue and the batch queue.

The debug queue has a limit of 30 minutes of wall time and 16 cores maximum, good for ensuring your command line execution arguments are correct. The debug queue is NOT to be used for production runs.

The batch queue is broken into subqueues based on the amount of cores and wall time requested. You submit jobs to the batch queue and they are automatically slotted into the appropriate subqueue based on the resources requested.

  • Jobs that ask for 256 or fewer cores can ask for a maximum wall-time of 96 hours.
  • Jobs that ask for more than 256 cores, to a maximum of 1440 cores, can ask for a maximum wall-time of 48 hours.


Jobs requesting more than 1440 cores are sent to a separate queue where they receive special handling.

What if I need more time?

For assemblies that would take longer then the wall time allowed by the queues, or for any other problems, please contact PSC support at This email address is being protected from spambots. You need JavaScript enabled to view it..


If you start a job, and then realize that you need more time, you can still send email to This email address is being protected from spambots. You need JavaScript enabled to view it. and PSC can extend the time of your running job.

Memory Allocation

The amount of memory that is allocated to your job is determined by the number of cores requested. The 16 cores on each blade share 128 Gbytes of RAM. This table shows the amount of RAM you have access to based on the number of cores that you request. Because there are 16 cores on a blade, and blades can not be shared among jobs, you must request cores in multiples of 16.

CoresMemory (Gbytes)
16 128
64 512
256 2048
512 4096
1024 8192
1424 13952

 

Charges

On Blacklight, Service Unit charges (SUs) are based on the number of cores a job uses. One core-hour is one SU. Because jobs do not share blades, and there are 16 cores on a blade, a one hour job that uses one blade will be charged 16 SUs.

Job Submission

Jobs are executed on Blacklight using a Portable Batch System(PBS/Torque) system. Users submit jobs to a scheduler which determines when the job is executed based on a number of factors including: the resources required for the job, the number of jobs a user has currently in the queue, the job's specified wall-time, and how many jobs are currently running. For quickest turnaround, jobs should only request the amount of resources needed.

You must create a job script and submit it to run a job. A number of things are required. The following template script is an example of running Trinity. Each #COMMENT line provides an explanation of the next line of the script.

#!/bin/csh
#COMMENT ncpus must be a multiple of 16, the formula for total RAM used by number of cpus is ncpus/16*128 = X GB
#PBS -l ncpus=32
#COMMENT The duration of time requested for the job, in this case 40 hours and 30 minutes #PBS -l walltime=95:00:00
#COMMENT Combines stdout and stderr in one file #PBS -j oe
#COMMENT Specifies the queue. Change this to 'debug' to access the debug queue (limit of ncpus=16 and walltime=00:30:00) #PBS -q batch
#COMMENT Emails you when the job starts, stops, or ends #PBS -m abe -M This email address is being protected from spambots. You need JavaScript enabled to view it.
set echo
#COMMENT Needed to load the module command source /usr/share/modules/init/csh
#COMMENT Set stacksize to unlimited limit stacksize unlimited
#COMMENT Move to your $SCRATCH directory, this directory should be where your read files are located cd $SCRATCH
#COMMENT Load most recent version of Trinity
#COMMENT Run 'module avail trinity' on Blacklight command line to find name of latest Trinity module
#COMMENT (unless need to continue a run started with a different version -- don't switch versions in the middle of an assembly!) module load trinity
#COMMENT Load latest versions of supporting modules required by Trinity module load bowtie module load samtools
#COMMENT Run the Trinity command Trinity --seqType fq --JM 100G --left reads.left.fq --right reads.right.fq --SS_lib_type RF --CPU 16 > trinity_output.log

MAKE SURE TO REDIRECT TRINITY OUTPUT TO A LOG FILE AS SHOWN ABOVE (> trinity_output.log) OR YOUR JOB WILL LIKELY GET KILLED!!!

If the output goes through the batch system, the job will be killed if the output exceeds 20 MB (which it usually does with Trinity).

Once you have copied the above template script and made the appropriate changes, you can create a job submission file and submit the job to the queue. You can use any text editor you are familiar with to do this. 

As an example, here we use the vi editor to create the job submission script. If you don't know how to use vi, see here:
http://heather.cs.ucdavis.edu/~matloff/UnixAndC/Editors/ViIntro.html

You can create a new file (or open an existing one) with vi by typing this on the Blacklight command line:

vi MyBlacklightSubmissionScript

Copy and paste the above script into the vi file and save it on Blacklight (copy entire script, go to open vi file, press i for insert, right click, hold shift and press z key twice).

Now you are ready to submit the job by typing

qsub MyBlacklightSubmissionScript

To ensure that the job was submitted properly use the command

qstat -f <pbsJobnumber>       or
qstat -f -u <your-username>

The output will look something like this:

user@tg-login1:~> qsub trinity_stage1.job
404675.tg-login1.blacklight.psc.teragrid.org
trinity/examples> qstat -f -u your-username
tg-login1.blacklight.psc.teragrid.org:
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID  NDS  TSK  Memory Time  S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
404675.tg-login1     user     batch_r  trinity_st     --   --  1024    --  08:30 Q   --
trinity/examples> qstat -f 404675
Job Id: 404675.tg-login1.blacklight.psc.teragrid.org
    Job_Name = trinity_stage1
    Job_Owner = This email address is being protected from spambots. You need JavaScript enabled to view it.
    job_state = Q
    queue = batch_r
    server = tg-login1.blacklight.psc.teragrid.org
    Checkpoint = u
    ctime = Mon Nov  3 10:30:35 2014
    Error_Path = tg-login1.blacklight.psc.teragrid.org:/usr/users/0/user/trin
        ity/examples/trinity_stage1.e404675
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Mon Nov  3 10:30:36 2014
    Output_Path = tg-login1.blacklight.psc.teragrid.org:/usr/users/0/user/tri
        nity/examples/trinity_stage1.o404675
    Priority = 0
    qtime = Mon Nov  3 10:30:35 2014
    Rerunable = False
    Resource_List.ncpus = 1024
    Resource_List.pnum_threads_limit = 2500
    Resource_List.walltime = 08:30:00
    Resource_List.walltime_max = 00:00:00
    Resource_List.walltime_min = 00:00:00
    Variable_List = PBS_O_HOME=/usr/users/0/user,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=user,
        PBS_O_PATH=/usr/local/packages/xsede/xdusage/1.1-2:/usr/psc/globus/5.
        2.5/bin:/usr/psc/globus/5.2.5/sbin:/usr/psc/bin:/usr/local/packages/tg
        /bin:/usr/local/lustre_utils/default/bin:/opt/intel/composer_xe_2013_s
        p1.0.080/bin/intel64:/opt/intel/composer_xe_2013_sp1.0.080/mpirt/bin/i
        ntel64:/opt/intel/composer_xe_2013_sp1.0.080/debugger/gdb/intel64_mic/
        py26/bin:/opt/intel/composer_xe_2013_sp1.0.080/debugger/gdb/intel64/py
        26/bin:/opt/intel/composer_xe_2013_sp1.0.080/bin/intel64:/opt/intel/co
        mposer_xe_2013_sp1.0.080/bin/intel64_mic:/opt/intel/composer_xe_2013_s
        p1.0.080/debugger/gui/intel64:/opt/intel/composer_xe_2013_sp1.0.080/bi
        n/intel64:/opt/intel/composer_xe_2013_sp1.0.080/mpirt/bin/intel64:/opt
        /intel/composer_xe_2013_sp1.0.080/debugger/gdb/intel64_mic/py26/bin:/o
        pt/intel/composer_xe_2013_sp1.0.080/debugger/gdb/intel64/py26/bin:/opt
        /intel/composer_xe_2013_sp1.0.080/bin/intel64:/opt/intel/composer_xe_2
        013_sp1.0.080/bin/intel64_mic:/opt/intel/composer_xe_2013_sp1.0.080/de
        bugger/gui/intel64:/opt/sgi/mpt/mpt-2.04/bin:/usr/local/packages/torqu
        e/2.3.13_psc/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/bin
        /X11:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin,
        PBS_O_MAIL=/var/mail/user,PBS_O_SHELL=/usr/psc/shells/csh,
        PBS_O_HOST=tg-login1.blacklight.psc.teragrid.org,
        PBS_SERVER=tg-login1.blacklight.psc.teragrid.org,
        PBS_O_WORKDIR=/usr/users/0/user/trinity/examples,PBS_O_QUEUE=batch
    comment = Job is scheduled to start next from queue batch_r by approximate
        ly Nov 04 23:00
    etime = Mon Nov  3 10:30:35 2014
    submit_args = trinity_stage1.job

Note that successful submission does not guarantee successful completion. An exit status will be given at the end of the job to designate how the job completed.

A detailed explanation of exit status values can be found here: http://www.clusterresources.com/torquedocs21/2.7jobexitstatus.shtml

During job run-time, the qstat command can be used to check on the status of a job, how much RAM is being used and how close the job is to reaching wall-time:

qstat -f <pbsjobnumber>     or
qstat -f -u <your-username>

If at anytime you want to cancel a running job, use the qkill command:

qkilll <pbsjobnumber>

Loading Files

All files that are needed for execution should be loaded to your $SCRATCH directory. Upon logging in type:

 cd $SCRATCH
 pwd

The directory given will be the path of your $SCRATCH directory. This directory can store and use large files, unlike your home directory.

user@tg-login1:~> cd $SCRATCH
user@tg-login1:~> pwd
/brashear/user
user@tg-login1:~>

If using GSI-SSHTerm to transfer files, upon logging in go to Tools > SFTP Session.

In the Address box, type in the full path to your $SCRATCH directory where you want to store the files. (Your screen will have your username rather than 'mbcougar'.)

FileTransfer.PNG

Using Trinity

The batch script above requests 32 CPU (or cores) with 256 GB of RAM for 95 hours. This should be enough to run most small to medium Trinity jobs. If your job is small, you may consider using 16 CPU (or cores) which allocates 128GB of RAM for your job, but be warned that there are only a limited number of 16 core jobs allowed to run on the system, so turnaround may actually be slower than for 32 core jobs. You can check if your 16 core job is held up by other 16 core jobs by running qstat -s <pbsjobnumber>:

user@tg-login1:~> qstat -s 208539
 
tg-login1.blacklight.psc.teragrid.org:
                                                                    Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID  NDS  TSK  Memory Time  S Time
-------------------- -------- -------- ---------- ------- ---- ---- ------ ----- - -----
208539.tg-login1     user     batch_r  myjob         --   --    16    --  00:10 Q   --
   host bl0.psc.teragrid.org has 7 16 core jobs running...limit is 7
 

Only 16 core jobs are limited in this fashion. Other jobs will run based on available cores and the number of jobs ahead of yours in the queue.

If your job is large, consider altering the parameters as necessary to accommodate the data. If you believe that you need more wall-time remember that Butterfly can be run separately from Inchworm and Chrysalis (recommended for large data-sets on Blacklight).

Using Interactive Access

Interactive access on Blacklight is possible; however, it should only be used for short debugging jobs. This command will request an interactive session with 16 cores (allocating 128 GB of RAM) and 30 minutes of wall-time:

qsub -I -l ncpus=16 -l walltime=00:30:00 -q debug

This job uses the debug queue, which has a limit of 16 cores for 30 minutes. Larger jobs must be run with a batch script as in the Job Submission section above.

If your job is killed

If you encounter the following error (or one with slightly different numerical values) that causes the job to stop, you did not ask for enough memory.  Request more memory (by requesting more cores) and resubmit the job. See the Memory allocation section of this document for details.

PBS: Job killed: cpuset memory_pressure 10562 reached/exceeded limit 1
    (numa memused is 134200964 kb)

If you have a gigantic job that will exceed the standard queue's limits for wall-time or RAM please email This email address is being protected from spambots. You need JavaScript enabled to view it. to request help.

Module Command

PSC has installed the module software on Blacklight. You can load Trinity and all its dependencies with the module command and execute it anywhere as if it were contained in your path. To see what versions of Trinity are currently installed, type:

module available trinity
user@tg-login1:~> module avail trinity
-------------------------- /usr/local/opt/modulefiles --------------------------
trinity/r2013-08-14   trinity/r2014-04-13p1
trinity/r2013-11-10   trinity/r2014-07-17
user@tg-login1:~>

Choose the version you want, then load it using its specific version number.

module load trinity/version-number

Note: When using interactive access you must load these modules after you have started your Interactive PBS access.

For a look at all programs that can be loaded with module type:

module available

After you load the trinity module, all the Trinity commands are available for you to use.

user@tg-login1:~> Trinity
Trinity: Command not found.
user@tg-login1:~> module load trinity user@tg-login1:~> Trinity ############################################################################### # # ______ ____ ____ ____ ____ ______ __ __ # | || \ | || \ | || || | | # | || D ) | | | _ | | | | || | | # |_| |_|| / | | | | | | | |_| |_|| ~ | # | | | \ | | | | | | | | | |___, | # | | | . \ | | | | | | | | | | | # |__| |__|\_||____||__|__||____| |__| |____/ # ############################################################################### # # Required: # # --seqType :type of reads: ( fa, or fq ) # # --JM :(Jellyfish Memory) number of GB of system memory to use for # k-mer counting by jellyfish (eg. 10G) *include the 'G' char # # If paired reads: # --left :left reads, one or more (separated by space) # --right :right reads, one or more (separated by space) # # Or, if unpaired reads: # --single :single reads, one or more (note, if single file contains pairs, can use flag: --run_as_paired ) # #

More information on the module command is found here: http://www.psc.edu/index.php/module

Before running Trinity, set stacksize to unlimited

If you are using bash, type:

ulimit -s unlimited

If you are using csh, type:

limit stacksize unlimited

Move to your $SCRATCH space

Your scratch directory is where all assembly files should be uploaded and where all large outputs should be kept on Blacklight (your $HOME space has a 5 GB quota). To move to your scratch space type:

cd $SCRATCH

If you need the location of this directory to transfer files with either WinSCP or GSI-SSHTerm type pwd. This will bring up the directory which should be:

/brashear/<your Blacklight User Name>

Just remember to backup any data on $SCRATCH either to $HOME (if it is on the order of megabytes) or to the archival system (if it is GBs or larger).

Execute Trinity Specific Commands

The following are examples of Trinity commands that can be used. A full list of options is available on Trinity's main site: http://trinityrnaseq.sourceforge.net/. We highly recommend that you read this list to see the correct usage for these commands.

Note: you must substitute your specific values for <Variable> in these examples. Do not include the '< >' symbols in your command. Be sure you are in your $SCRATCH directory and that your input files are located there also.

Strand Specific Sequencing (Preferred Library Method typical of the dUTP/UDG sequencing method) :

Trinity.pl --seqtype fq --kmer_method meryl --left <YourReads1.fq> --right <YourReads2.fq> --output <DirNameForOutput> --SS_lib_type RF --min_contig_length <contigLengthMinCutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > trinity_output.log

Note: other methods of Strand Specific library generation may require FR orientation, please review the Trinity website for a full explanation.

Non Strand Specific Library

Trinity.pl --seqtype fq --kmer_method meryl --left <YourReads1.fq> --right <YourReads2.fq> --output <DirNameForOutput> --min_contig_length <contigLengthMinCutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > trinity_output.log

Other Options for consideration

Some additional Trinity options are given here. For a complete list of advanced options and guide for Trinity use see: http://trinityrnaseq.sourceforge.net/advanced_trinity_guide.html

--paired_fragment_length <int>
This is the insert size for paired end reads, default is 300
--jaccard_clip
Requires bowtie module to be loaded, only recommended if you are assembling a transcriptome from a gene dense genome such as a fungal genome. If you have paired end reads, Trinity uses Bowtie to determine that consistent pairing is used, this is not recommended for large genomes. Ensure that your read names are properly labeled by ending with "/1" "/2
--kmer_method (required) <meryl> <jellyfish> or <inchworm>
These are the different methods that can be used for kmer creation with inchworm. More documentation can be found on the Trinity website or the meryl website listed above. For large to very large assemblies these parameters can be adjusted for improved performance at a trade off for the amount of RAM used.
--cpu <int>
Number of CPUS, this should be equal to the number of CPUs (cores) requested for the job
--bflyCPU <int>
Number of CPUS to use for Butterfly,should be equal to that of the amount of CPUs (cores) requested for the job
--bflyHEapSpaceInit <string>
This value is the amount of RAM initially each thread will use in the butterfly job, the product of this value and the thread count can not exceed the amount of RAM allocated for the job. An example of a acceptable value is 3G for 3GB of initial java heap space
--bflyHeapSpaceMax <string>
This is the amount of heap space butterfly will attempt to use if the initial amount is insufficient, if a job does not complete and exits with an error.
--no_run_chrysalis
Only Run Inchworm, can be useful when dealing with very large jobs that require a large amount of wall time
--max_reads_per_graph
The maximum amount of reads Chrysalis will anchor for any given graph
--max_reads_per_loop
Maximum amount of reads to read into memory at once for Chrysalis

More information on Trinity

BLAST

   The Basic Local Alignment Search Tool (BLAST) finds regions of
   local similarity between sequences. The program compares nucleotide
   or protein sequences to sequence databases and calculates the
   statistical significance of matches. BLAST can be used to infer
   functional and evolutionary relationships between sequences as well
   as help identify members of gene families.

   There are many search programs in the blast suite, depending on the
   type of analysis to be done:
   
   blastn - Search a nucleotide database using a nucleotide query
            Methods: blastn, megablast, discontiguous megablast
    
   blastp - Search protein database using a protein query
            Methods: blastp, psi-blast, phi-blast, delta-blast

   blastx - Search protein database using a translated nucleotide query

   tblastn - Search translated nucleotide database using a protein query

   tblastx - Search translated nucleotide database using a translated
             nucleotide query

   psiblast - Position-Specific Initiated BLAST

   rpsblast - Reverse Position Specific BLAST

   rpstblast - Translated Reverse Position Specific BLAST

   deltablast - Domain enhanced lookup time accelarated BLAST

Installed on blacklight, biou

Other resources that may be helpful include:

    Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990)
    "Basic local alignment search tool."
    J. Mol. Biol. 215:403-410.

    Altschul, S.F., Madden, T.L., Sch###er, A.A., Zhang, J., Zhang, Z.,
    Miller, W. & Lipman, D.J. (1997)
    "Gapped BLAST and PSI-BLAST: a new generation of protein database
    search programs." Nucleic Acids Res. 25:3389-3402

    Website: http://www.ncbi.nlm.nih.gov/books/NBK1762/


Running BLAST

1) Make BLAST programs availiable for use
   a) blacklight:
   The BLAST programs will be made availiable for use through
   the module command. To load the BLAST module enter:

   module load ncbi-blast
 
   b) biou:

   The BLAST programs are availiable through the Galaxy instance on biou.

   To make the BLAST programs availiable through the command line,
   csh users should enter the following command:

   % source /packages/bin/SETUP_BIO_SOFTWARE

   To make the BLAST programs availiable through the command line, bash
   users should enter the following command:
 
   % source /packages/bin/SETUP_BIO_SOFTWARE

2) General Usage:

   To find the general usage of an individual program in the BLAST suite,
   use the -help flag. For example:

   blastn -help
   blastp -help
   blastx -help
   deltablast -help
   makeblastdb -help
   makeprofiledb -help
   psiblast -help
   rpsblast -help
   rpstblastn -help
   tblastn -help
   tblastx -help

   To run blast using your own fasta formatted sequence collection as a
   database, make sure that the database is converted to blast format
   prior to running the blast command. The program within the blast suite
   that does blast database formating from a fasta sequence collection is
   called "makeblastdb". For example:
 
   makeblastdb -in uniprot_sprot.fasta -dbtype prot

   After the database is formatted, run the desired blast program. For
   example:

   blastp -query myquery.fasta -num_threads 16 -db uniprot_sprot.fasta -out blastp.out


3) PBS Examples (blacklight)

   a) blastp (16 cores): This example illustrates the simplest way to run
      the blastp program on blacklight using 16 threads:

      #!/bin/bash
      #PBS -l ncpus=16
      #PBS -l walltime=24:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Blastp
      #
      source /usr/share/modules/init/bash
      module load ncbi-blast
      module load trinotate
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Blastp
      PROT=/brashear/$USER/protein.fasta
      BLASTTHREADS=16
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Take protein file and run BLAST against SWISSPROT:
      #
      date
      blastp -query $PROT -num_threads $BLASTTHREADS -db $TRDB_SPROT \
             -out blastp.out > blastp.log 2>&1
      ja -set $SCRATCH/$$.ja


   b) Scalable BLAST (32 core): This example illustrates how to use
      fasta_splitter to scale the blastp program on blacklight using
      2 parallel runs at 16 threads each:

      #!/bin/bash
      #PBS -l ncpus=32
      #PBS -l walltime=96:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Trinotate_blastp
      #
      source /usr/share/modules/init/bash
      module load trinotate
      module unload perl
      module load fasta_splitter
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Trinity_Output/Trinotate_Blastp
      TRINITY_PROT=/brashear/$USER/Trinity_Output/Proteins/best_candidates.eclipsed_orfs_removed.pep
      #
      # parallel parameters
      #
      BLASTRUNS=2    # Number of independent BLAST runs
      BLASTTHREADS=16 # Number of BLAST Threads ( BLASTTHREAD * BLASTRUNS = NCPUS)
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Split file into sections so we can run blast independently
      #
      cp $TRINITY_PROT Trinity_protein.fasta
      perl $FASTA_SPLIT_HOME/fasta_splitter.pl -n-parts-sequence $BLASTRUNS Trinity_protein.fasta
      ls -l
      #
      # TRINOTATE: Take protein file and run BLAST against SWISSPROT:
      #
      date
      PLACEON=0
      PART=1
      for F in Trinity_protein.part*
      do
        ((PLACETHROUGH= PLACEON + BLASTTHREADS - 1))
        dplace -o dplacelog$PLACEON -c $PLACEON-$PLACETHROUGH blastp \
          -query $F -num_threads $BLASTTHREADS -db $TRDB_SPROT -outfmt 6 \
          -max_target_seqs 1 \
          -out blastp_$PART.out \
          > blastp_$PART.log 2>&1 &
        ((PLACEON= PLACEON + BLASTTHREADS))
        ((PART=PART + 1))
      done
      wait
      cat blastp_*.out > Trinity_protein_blastp_all.out
      head -100 dplacelog*
      date
      ja -set $SCRATCH/$$.ja
[costa@pscuxa ~]$ cat BLAST.txt
BLAST

   The Basic Local Alignment Search Tool (BLAST) finds regions of
   local similarity between sequences. The program compares nucleotide
   or protein sequences to sequence databases and calculates the
   statistical significance of matches. BLAST can be used to infer
   functional and evolutionary relationships between sequences as well
   as help identify members of gene families.

   There are many search programs in the blast suite, depending on the
   type of analysis to be done:
   
   blastn - Search a nucleotide database using a nucleotide query
            Methods: blastn, megablast, discontiguous megablast
    
   blastp - Search protein database using a protein query
            Methods: blastp, psi-blast, phi-blast, delta-blast

   blastx - Search protein database using a translated nucleotide query

   tblastn - Search translated nucleotide database using a protein query

   tblastx - Search translated nucleotide database using a translated
             nucleotide query

   psiblast - Position-Specific Initiated BLAST

   rpsblast - Reverse Position Specific BLAST

   rpstblast - Translated Reverse Position Specific BLAST

   deltablast - Domain enhanced lookup time accelarated BLAST

Installed on blacklight, biou

Other resources that may be helpful include:

    Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990)
    "Basic local alignment search tool."
    J. Mol. Biol. 215:403-410.

    Altschul, S.F., Madden, T.L., Sch###er, A.A., Zhang, J., Zhang, Z.,
    Miller, W. & Lipman, D.J. (1997)
    "Gapped BLAST and PSI-BLAST: a new generation of protein database
    search programs." Nucleic Acids Res. 25:3389-3402

    Website: http://www.ncbi.nlm.nih.gov/books/NBK1762/


Running BLAST

1) Make BLAST programs availiable for use
   a) blacklight:
   The BLAST programs will be made availiable for use through
   the module command. To load the BLAST module enter:

   module load ncbi-blast
 
   b) biou:

   The BLAST programs are availiable through the Galaxy instance on biou.

   To make the BLAST programs availiable through the command line,
   csh users should enter the following command:

   % source /packages/bin/SETUP_BIO_SOFTWARE

   To make the BLAST programs availiable through the command line, bash
   users should enter the following command:
 
   % source /packages/bin/SETUP_BIO_SOFTWARE

2) General Usage:

   To find the general usage of an individual program in the BLAST suite,
   use the -help flag. For example:

   blastn -help
   blastp -help
   blastx -help
   deltablast -help
   makeblastdb -help
   makeprofiledb -help
   psiblast -help
   rpsblast -help
   rpstblastn -help
   tblastn -help
   tblastx -help

   To run blast using your own fasta formatted sequence collection as a
   database, make sure that the database is converted to blast format
   prior to running the blast command. The program within the blast suite
   that does blast database formating from a fasta sequence collection is
   called "makeblastdb". For example:
 
   makeblastdb -in uniprot_sprot.fasta -dbtype prot

   After the database is formatted, run the desired blast program. For
   example:

   blastp -query myquery.fasta -num_threads 16 -db uniprot_sprot.fasta -out blastp.out


3) PBS Examples (blacklight)

   a) blastp (16 cores): This example illustrates the simplest way to run
      the blastp program on blacklight using 16 threads:

      #!/bin/bash
      #PBS -l ncpus=16
      #PBS -l walltime=24:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Blastp
      #
      source /usr/share/modules/init/bash
      module load ncbi-blast
      module load trinotate
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Blastp
      PROT=/brashear/$USER/protein.fasta
      BLASTTHREADS=16
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Take protein file and run BLAST against SWISSPROT:
      #
      date
      blastp -query $PROT -num_threads $BLASTTHREADS -db $TRDB_SPROT \
             -out blastp.out > blastp.log 2>&1
      ja -set $SCRATCH/$$.ja


   b) Scalable BLAST (32 core): This example illustrates how to use
      fasta_splitter to scale the blastp program on blacklight using
      2 parallel runs at 16 threads each:

      #!/bin/bash
      #PBS -l ncpus=32
      #PBS -l walltime=96:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Trinotate_blastp
      #
      source /usr/share/modules/init/bash
      module load trinotate
      module unload perl
      module load fasta_splitter
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Trinity_Output/Trinotate_Blastp
      TRINITY_PROT=/brashear/$USER/Trinity_Output/Proteins/best_candidates.eclipsed_orfs_removed.pep
      #
      # parallel parameters
      #
      BLASTRUNS=2    # Number of independent BLAST runs
      BLASTTHREADS=16 # Number of BLAST Threads ( BLASTTHREAD * BLASTRUNS = NCPUS)
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Split file into sections so we can run blast independently
      #
      cp $TRINITY_PROT Trinity_protein.fasta
      perl $FASTA_SPLIT_HOME/fasta_splitter.pl -n-parts-sequence $BLASTRUNS Trinity_protein.fasta
      ls -l
      #
      # TRINOTATE: Take protein file and run BLAST against SWISSPROT:
      #
      date
      PLACEON=0
      PART=1
      for F in Trinity_protein.part*
      do
        ((PLACETHROUGH= PLACEON + BLASTTHREADS - 1))
        dplace -o dplacelog$PLACEON -c $PLACEON-$PLACETHROUGH blastp \
          -query $F -num_threads $BLASTTHREADS -db $TRDB_SPROT -outfmt 6 \
          -max_target_seqs 1 \
          -out blastp_$PART.out \
          > blastp_$PART.log 2>&1 &
        ((PLACEON= PLACEON + BLASTTHREADS))
        ((PART=PART + 1))
      done
      wait
      cat blastp_*.out > Trinity_protein_blastp_all.out
      head -100 dplacelog*
      date
      ja -set $SCRATCH/$$.ja

 

Trinotate

   Trinotate is an annotation method suitable for computationally assembled
   transcripts from RNAseq sequencing data.

Installed on blacklight

Other resources that may be helpful include:

    Website: http://trinotate.sourceforge.net/


Running Trinotate

1) Make Trinotate programs availiable for use
  
   The Trinotate process relies on a number of underlying program and
   sequence databases. To make all of these programs availiable for use
   use the following module command:

   module load trinotate

   This module will load a number of modules including: trinotate_db,
   ncbi-blast, signalp, tmhmm, hmmer, RNAmmer.   
 

2) General Usage:

   The general Trinotate process is as follows. With the transcripts:

   a) Run blastx with the transcripts against uniprot-swissprot
   b) Run RNAmmer with the transcripts
   c) Generate conceptual protein translations of the transcripts
      1) Run blastp with the conceptual protein translations against
         uniprot-swissprot
      2) Run hmmsearch with the conceptual protein translations against
         PFAM
      3) Run signalp with the conceptual protein translations
      4) Run tmhmm with the conceptual protein translations
   d) When the above runs are done, load them into a pre-populated
      SQLite database that contains annotation information (including go
      terms) linked to the uniprot-swissprot and pfam identifiers.     
   e) query the database and generate a report that can be viewed in
      spreadsheet software.

      In general, the blastx and blastp steps will take the most amount
      of time.

3) PBS Examples

   a) Below is a simple example that runs most of the Trinotate process in
      one PBS job. If you have many transcripts (> 10,000), we do not
      recommend running the entire process in one batch job in the way
      illustrated below.   

      #!/bin/bash
      #PBS -l ncpus=16
      #PBS -l walltime=24:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Trinotate
      #
      source /usr/share/modules/init/bash
      module load java
      module load trinity
      module load trinotate
      module load sqlite
      set -x
      ja $SCRATCH/$$.ja
      #
      # Sample Data
      #
      OUTDIR=/brashear/$USER/Trinotate
      PROT=/brashear/$USER/Protein/best_candidates.eclipsed_orfs_removed.pep
      NA=/brashear/$USER/trinity_out_dir/Trinity.fasta
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Here we start with JUST transcripts and protein coding regions in
      # those transcripts.
      #
      # STEP 1: Take protein file and run BLAST against SWISSPROT:
      #
      blastp -query $PROT -db $TRDB_SPROT -num_threads 16 -max_target_seqs 1 \
       -outfmt 6 > TrinotateBlast.out
      #
      # STEP 2: Take protein file and run hmmscan against PFAM-A:
      #
      hmmscan --cpu 16 --domtblout TrinotatePFAM.out $TRDB_PFAM $PROT > pfam.log
      #
      # STEP 3: Run signalp to predict signal peptides:
      #
      signalp -f short -n signalp.out $PROT
      #
      # STEP 4: Run tmhmm to predict transmembrane regions:
      #
      tmhmm --short < $PROT > tmhmm.out
      #
      # STEP 5: Take nucleic acid file and run BLAST against SWISSPROT:
      #
      blastx \
         -query $NA -num_threads $BLASTTHREADS -db $TRDB_SPROT -outfmt 6 \
         -max_target_seqs 1 \
         -out TrinotateBlastx.out
      #
      # STEP 6: Run tmhmm to predict transmembrane regions:
      #
      $TRINOTATE_HOME/util/rnammer_support/RnammerTranscriptome.pl \
       --transcriptome $NA --path_to_rnammer $RNAMMER_HOME/rnammer \
       >& RnammerTranscriptome.log
      #
      # STEP 7: Generate Gene/Transcript relationships
      #
      $TRINITY_HOME/util/get_Trinity_gene_to_trans_map.pl $NA \
          >  Trinity.fasta.gene_trans_map
      #
      # STEP 8: Initialize SQLITE Database
      #
      cp $TRDB_SQLITE Trinotate.sqlite
      Trinotate.pl init --gene_trans_map Trinity.fasta.gene_trans_map \
         --transcript_fasta $NA --transdecoder_pep $PROT
      #
      # STEP 9: Load Blast Results
      #
      Trinotate.pl LOAD_blast TrinotateBlast.out
      #
      # STEP 10: Load PFAM Results
      #
      Trinotate.pl LOAD_pfam TrinotatePFAM.out
      #
      # STEP 11: Load tmhmm Results
      #
      Trinotate.pl LOAD_tmhmm tmhmm.out
      #
      # STEP 12: Load signalp Results
      #
      Trinotate.pl LOAD_signalp signalp.out
      #
      # STEP 13: Load rnammer Results
      #
      Trinotate.pl LOAD_rnammer Trinity.fasta.rnammer.gff
      #
      # STEP 14: Load blastx Results
      #
      Trinotate.pl LOAD_blastx TrinotateBlastx.out
      #
      # STEP 15: Generate annotation report
      #
      Trinotate.pl report > trinotate_annotation_report.xls
      #
      ja -set $SCRATCH/$$.ja


   b) Below are examples runs the Trinotate processes as individual PBS jobs,
      This is the recommended method if you have a large number of transcripts

      **********************
      ******* BLASTX *******
      **********************

      #!/bin/bash
      #PBS -l ncpus=32
      #PBS -l walltime=24:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Trinotate_blastx
      #
      source /usr/share/modules/init/bash
      module load trinotate
      module unload perl
      module load fasta_splitter
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Trinotate_blastx
      TRINITY_NA=/brashear/$USER/trinity_out_dir/Trinity.fasta
      TRINITY_PROT=/brashear/$USER/Proteins/best_candidates.eclipsed_orfs_removed.pep
      #
      # parallel parameters
      #
      BLASTRUNS=2    # Number of independent BLAST runs
      BLASTTHREADS=16 # Number of BLAST Threads ( BLASTTHREAD*BLASTRUNS = NCPUS)
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Split file into sections so we can run blast independently
      #
      cp $TRINITY_NA Trinity_na.fasta
      perl $FASTA_SPLIT_HOME/fasta_splitter.pl -n-parts-sequence \
           $BLASTRUNS Trinity_na.fasta
      ls -l
      #
      # TRINOTATE: Take protein file and run BLAST against SWISSPROT:
      #
      date
      PLACEON=0
      PART=1
      for F in Trinity_na.part*
      do
        ((PLACETHROUGH= PLACEON + BLASTTHREADS - 1))
        dplace -o dplacelog$PLACEON -c $PLACEON-$PLACETHROUGH blastx \
          -query $F -num_threads $BLASTTHREADS -db $TRDB_SPROT -outfmt 6 \
          -max_target_seqs 1 \
          -out blastx_$PART.out \
          > blastx_$PART.log 2>&1 &
        ((PLACEON= PLACEON + BLASTTHREADS))
        ((PART=PART + 1))
      done
      wait
      ls -l
      cat blastx_*.out > Trinity_na_blastx_all.out
      head -100 dplacelog*
      date
      ja -set $SCRATCH/$$.ja

      ***********************
      ******* RNAMMER *******
      ***********************

      #!/bin/bash
      #PBS -l ncpus=16
      #PBS -l walltime=24:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Trinotate_rnammer
      #
      source /usr/share/modules/init/bash
      module load trinotate
      module unload perl
      module load perl/5.12.5-threads
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Trinotate_RNAmmer
      TRINITY_NA=/brashear/$USER/trinity_out_dir/Trinity.fasta
      TRINITY_PROT=/brashear/$USER/Proteins/best_candidates.eclipsed_orfs_removed.pep
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      ls -l
      #
      # TRINOTATE: Take na file and run rnammer
      #
      date
      $TRINOTATE_HOME/util/rnammer_support/RnammerTranscriptome.pl \
        --transcriptome $TRINITY_NA --path_to_rnammer $RNAMMER_HOME/rnammer \
        >& RnammerTranscriptome.log
      date
      ja -set $SCRATCH/$$.ja


      **********************
      ******* BLASTP *******
      **********************


      #!/bin/bash
      #PBS -l ncpus=32
      #PBS -l walltime=96:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Trinotate_blastp
      #
      source /usr/share/modules/init/bash
      module load trinotate
      module unload perl
      module load fasta_splitter
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Trinotate_blastp
      TRINITY_NA=/brashear/$USER/trinity_out_dir/Trinity.fasta
      TRINITY_PROT=/brashear/$USER/Proteins/best_candidates.eclipsed_orfs_removed.pep
      #
      # parallel parameters
      #
      BLASTRUNS=2    # Number of independent BLAST runs
      BLASTTHREADS=16 # Number of BLAST Threads (BLASTTHREAD*BLASTRUNS=NCPUS)
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Split file into sections so we can run blast independently
      #
      cp $TRINITY_PROT Trinity_protein.fasta
      perl $FASTA_SPLIT_HOME/fasta_splitter.pl -n-parts-sequence \
           $BLASTRUNS Trinity_protein.fasta
      ls -l
      #
      # TRINOTATE: Take protein file and run BLAST against SWISSPROT:
      #
      date
      PLACEON=0
      PART=1
      for F in Trinity_protein.part*
      do
        ((PLACETHROUGH= PLACEON + BLASTTHREADS - 1))
        dplace -o dplacelog$PLACEON -c $PLACEON-$PLACETHROUGH blastp \
          -query $F -num_threads $BLASTTHREADS -db $TRDB_SPROT -outfmt 6 \
          -max_target_seqs 1 \
          -out blastp_$PART.out \
          > blastp_$PART.log 2>&1 &
        ((PLACEON= PLACEON + BLASTTHREADS))
        ((PART=PART + 1))
      done
      wait
      cat blastp_*.out > Trinity_protein_blastp_all.out
      head -100 dplacelog*
      date
      ja -set $SCRATCH/$$.ja


      ********************
      ******* PFAM *******
      ********************


      #!/bin/bash
      #PBS -l ncpus=32
      #PBS -l walltime=96:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Trinotate_pfam
      #
      source /usr/share/modules/init/bash
      module load trinotate
      module unload perl
      module load fasta_splitter
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Trinotate_PFAM
      TRINITY_NA=/brashear/$USER/trinity_out_dir/Trinity.fasta
      TRINITY_PROT=/brashear/$USER/Proteins/best_candidates.eclipsed_orfs_removed.pep
      #
      # parallel parameters
      #
      HMMERRUNS=2    # Number of independent hmmer runs
      HMMERTHREADS=16 # Number of hmmer Threads (BLASTTHREAD*HMMERRUNS=NCPUS)
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Split file into sections so we can run blast independently
      #
      cp $TRINITY_PROT Trinity_protein.fasta
      perl $FASTA_SPLIT_HOME/fasta_splitter.pl -n-parts-sequence \
           $HMMERRUNS Trinity_protein.fasta
      ls -l
      #
      # TRINOTATE: Take protein file and run hmmscan against PFAM:
      #
      date
      PLACEON=0
      PART=1
      for F in Trinity_protein.part*
      do
        ((PLACETHROUGH= PLACEON + HMMERTHREADS - 1))
        dplace -o dplacelog$PLACEON -c $PLACEON-$PLACETHROUGH hmmscan \
          --cpu $HMMERTHREADS --domtblout PFAM_$PART.out \
          $TRDB_PFAM $F > pfam_$PART.log 2>&1 &
        ((PLACEON= PLACEON + HMMERTHREADS))
        ((PART=PART + 1))
      done
      wait
      ls -l
      cat PFAM_*.out > Trinity_protein_pfam_all.out
      head -100 dplacelog*
      date
      ja -set $SCRATCH/$$.ja


      ***********************
      ******* SIGNALP *******
      ***********************


      #!/bin/bash
      #PBS -l ncpus=16
      #PBS -l walltime=24:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Trinotate_signalp
      #
      source /usr/share/modules/init/bash
      module load trinotate
      module unload perl
      module load fasta_splitter
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Trinotate_signalp
      TRINITY_NA=/brashear/$USER/trinity_out_dir/Trinity.fasta
      TRINITY_PROT=/brashear/$USER/Proteins/best_candidates.eclipsed_orfs_removed.pep
      #
      # parallel parameters
      #
      RUNS=16    # Number of independent runs
      THREADS=1  # Number of Threads ( THREADS * RUNS = NCPUS)
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Split file into sections so we can run independently
      #
      cp $TRINITY_PROT Trinity_prot.fasta
      perl $FASTA_SPLIT_HOME/fasta_splitter.pl -n-parts-sequence \
           $RUNS Trinity_prot.fasta
      ls -l
      #
      # TRINOTATE: Take protein file and run signalp
      #
      date
      PLACEON=0
      PART=1
      for F in Trinity_prot.part*
      do
        ((PLACETHROUGH= PLACEON + THREADS - 1))
        dplace -o dplacelog$PLACEON -c $PLACEON-$PLACETHROUGH signalp \
          -f short -n signalp_$PART.out $F \
          > signalp_$PART.log 2>&1 &
        ((PLACEON= PLACEON + THREADS))
        ((PART=PART + 1))
      done
      wait
      ls -l
      cat signalp_*.out > Trinity_prot_signalp_all.out
      head -100 dplacelog*
      date
      ja -set $SCRATCH/$$.ja


      *********************
      ******* TMHMM *******
      *********************


      #!/bin/bash
      #PBS -l ncpus=16
      #PBS -l walltime=24:00:00
      #PBS -j oe
      #PBS -q batch
      #PBS -N Trinotate_tmhmm
      #
      source /usr/share/modules/init/bash
      module load trinotate
      module unload perl
      module load fasta_splitter
      module unload perl
      module load perl/5.12.3
      set -x
      ja $SCRATCH/$$/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Trinotate_tmhmm
      TRINITY_NA=/brashear/$USER/trinity_out_dir/Trinity.fasta
      TRINITY_PROT=/brashear/$USER/Proteins/best_candidates.eclipsed_orfs_removed.pep
      #
      # parallel parameters
      #
      RUNS=16    # Number of independent runs
      THREADS=1 # Number of Threads ( THREADS * RUNS = NCPUS)
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # Split file into sections so we can run independently
      #
      cp $TRINITY_PROT Trinity_prot.fasta
      perl $FASTA_SPLIT_HOME/fasta_splitter.pl -n-parts-sequence \
           $RUNS Trinity_prot.fasta
      ls -l
      #
      # TRINOTATE: Take protein file and run tmhmm
      #
      date
      PLACEON=0
      PART=1
      for F in Trinity_prot.part*
      do
        ((PLACETHROUGH= PLACEON + THREADS - 1))
        dplace -o dplacelog$PLACEON -c $PLACEON-$PLACETHROUGH tmhmm --d \
         --short $F  > tmhmm_$PART.out 2>tmhmm_$PART.err &
        ((PLACEON= PLACEON + THREADS))
        ((PART=PART + 1))
      done
      wait
      ls -l
      cat tmhmm_*.out > Trinity_prot_tmhmm_all.out
      head -100 dplacelog*
      date
      ja -set $SCRATCH/$$.ja


      **********************************************************************
      ******* PLACE RESULTS INTO SQLITE DATABASE AND GENERATE REPORT *******
      **********************************************************************


      #!/bin/bash
      #PBS -l ncpus=16
      #PBS -l walltime=00:30:00
      #PBS -j oe
      #PBS -q debug
      #PBS -N Trinotate_database
      #
      source /usr/share/modules/init/bash
      module load trinotate
      module load trinity/r2013-08-14
      module load sqlite
      module unload perl
      module load perl/5.16.0
      set -x
      ja $SCRATCH/$$.ja
      #
      # Data
      #
      OUTDIR=/brashear/$USER/Trinotate_AnnotationDatabase
      TRINITY_NA=/brashear/$USER/trinity_out_dir/Trinity.fasta
      TRINITY_PROT=/brashear/$USER/Proteins/best_candidates.eclipsed_orfs_removed.pep
      TRINOTATE_blastp=/brashear/$USER/Trinotate_blastp/Trinity_protein_blastp_all.out
      TRINOTATE_blastx=/brashear/$USER/Trinotate_blastx/Trinity_na_blastx_all.out
      TRINOTATE_pfam=/brashear/$USER/Trinotate_PFAM/Trinity_protein_pfam_all.out
      TRINOTATE_signalp=/brashear/$USER/Trinotate_signalp/Trinity_prot_signalp_all.out
      TRINOTATE_tmhmm=/brashear/$USER/Trinotate_tmhmm/Trinity_prot_tmhmm_all.out
      TRINOTATE_rnammer=/brashear/ropelews/Trinotate_RNAmmer/Trinity.fasta.rnammer.gff
      #
      mkdir -p $OUTDIR
      cd $OUTDIR
      #
      # STEP A: Generate Gene/Transcript relationships
      #
      $TRINITY_HOME/util/get_Trinity_gene_to_trans_map.pl $TRINITY_NA \
          >  Trinity.fasta.gene_trans_map
      #
      # STEP B: Initialize SQLITE Database
      #
      cp $TRDB_SQLITE Trinotate.sqlite
      Trinotate.pl init --gene_trans_map Trinity.fasta.gene_trans_map \
          --transcript_fasta $TRINITY_NA --transdecoder_pep $TRINITY_PROT
      #
      # STEP C: Load Blast Results
      #
      # Old version (trinity/r2013-02-25)
      #Trinotate.pl LOAD_blast $TRINOTATE_blastp
      # New Version (trinity/r2013-08-14)
      Trinotate.pl LOAD_blastp $TRINOTATE_blastp
      Trinotate.pl LOAD_blastx $TRINOTATE_blastx
      #
      # STEP D: Load PFAM Results
      #
      Trinotate.pl LOAD_pfam $TRINOTATE_pfam
      #
      # STEP E: Load tmhmm Results
      #
      Trinotate.pl LOAD_tmhmm $TRINOTATE_tmhmm
      #
      # STEP F: Load signalp Results
      #
      Trinotate.pl LOAD_signalp $TRINOTATE_signalp
      #
      # STEP G: Load RNAmmer Results
      #
      Trinotate.pl LOAD_rnammer $TRINOTATE_rnammer
      #
      # STEP G: Generate annotation report
      #
      Trinotate.pl report > trinotate_annotation_report.xls
      #
      ja -set $SCRATCH/$$.ja

 

Tabix

 

Tabix is a generic indexer for TAB-delimited genome position files.

 

Installed on blacklight

 

Other resources

Website: http://samtools.sourceforge.net/tabix.html

 

Running Tabix

1. Create a batch job which

     1. Sets up the use of the module command in a batch job

     2. Loads the tabix module

           module load tabix

     3. Includes other commands to run Tabix

2. Submit the batch job with the qsub command