Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing,
communications and data analytics.

Trinity

Trinity, developed at the Broad Institute, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.

 

Documentation

Usage

 

  • Greenfield

To use Trinity sucessfully on Greenfield, first choose the version you will use, then construct your job script.

Choose which version of Trinity to use

Multiple versions of Trinity are installed. It is important to know which version you are using, as the command line options and the default settings can change between versions. It is also a good practice to use the same version throughout your project.

There is always a default version of Trinity defined, but it changes as new versions are added and older versions deleted. For this reason, you should never just load the default version; it may change without notice. Always load the specific version that you want.

The module command can tell you what versions are installed.  When you have chosen a version, you will use the module load command to set up the correct environment to run that version of Trinity. For more information, see documentation on the module command.

 For help on selecting a version:

  • See what versions are available

    To see what versions of Trinity are available, type

    module available trinity
  • See the options for a specific version

    You can see the available options and defaults for a given version. First load the specific Trinity module you are interested in with the command

    module load specific-trinity-version

    Type in the full name of the module from the module available trinity command. For example, to see the options for version r2.1.1, type

    module load trinity/2.1.1

    To see the Trinity command line options for this version, now type

    Trinity
  • Read the release notes for a specific version

    You can read the release notes for any version by looking at the Release.Notes file, found in the top level directory (/usr/local/packages/trinity/version) for that version. For example, to see the release notes for version 2.1.1, look at the file /usr/local/packages/trinity/2.1.1/Release.Notes.

 

Construct your job script

Include commands like the following in your job script.

  • Combine stdout and stderr

    In a batch job, the messages and errors that are normally displayed on the monitor while an interactive job runs are instead written to two files, stdout and stderr, respectively.

    Redirect stdout and stderr by using the PBS directive -j oe. This combines both stdout and stderr into one file, which makes debugging easier. Put this line into your batch script:

    #PBS -j oe my-PBS-output-file
    

    For more information on PBS directives in batch jobs, see the blacklight document.

  • Load the appropriate Trinity version and (optionally) associated modules

    Use the module load command to define the correct environment to run a specific version of Trinity.  Type in the full name of the module from the module available trinity command.

    Some versions of Trinity will also load the modules for other software typically needed for a Trinity job; for other versions, you will need to load modules for associated software yourself. Before constructing your job script, type

    module help specific-trinity-version

    to see which is the case for your chosen version.

  • Set the stack size to unlimited

    You must set the stack size to unlimited in your batch script before running Trinity, or the job will fail.

    If you are using bash, type

    ulimit -s unlimited

    If you are using csh, type

    limit stacksize unlimited
  • Copy your input files to $SCRATCH

    Your $SCRATCH directory on blacklight is intended to be used as working space for your running jobs. All of the files that your job needs should be copied to $SCRATCH.   Copy them with

    cp inputfile $SCRATCH
  • Move to your $SCRATCH directory

    Move to your $SCRATCH before starting Trinity with

    cd $SCRATCH
    
  • Run Trinity

    Some typical command lines are given below. We recommend that you look at the complete list of options available, given at http://trinityrnaseq.sourceforge.net.

    It is important that the output produced by Trinity be redirected into a file. By default, Trinity output is written to stdout.  This can cause trouble on blacklight, because stdout and stderr files are limited to 20 Mbytes each. If either file exceeds this limit, the job will be killed.

    The Trinity package writes a lot of information to stdout and stderr and often exceeds these limits. To prevent your job from being killed by the system, you should redirect Trinity output to a different file. To redirect your Trinity output, use the ">" operator. Here is a command line showing Trinity output redirected to a file called my-trinity-output.out.

    Trinity command-line-options > my-trinity-output.out

     

    Typical Trinity command lines

    Variables in brackets should be replaced with the desired options or names of your input files. Do not include the brackets themselves in the command line.

    Strand Specific Sequencing (Preferred Library Method typical of the dUTP/UDG sequencing method)

    Please note that other methods of Strand Specific library generation may require FR orientation. See the Trinity website for a full explanation.

    Trinity --seqtype fq --JM 100G --left <yourreads1.fq> --right <yourreads2.fq> --output <dirnameforoutput> --SS_lib_type RF --min_contig_length <contiglengthmincutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > my_trinity_output.out
    Non-Strand Specific Library
    Trinity --seqtype fq --JM 100G --left <yourreads1.fq> --right <yourreads2.fq> --output <dirnameforoutput> --min_contig_length <contiglengthmincutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > my_trinity_output.out
  • Copy your Trinity output file from $SCRATCH

    Although $SCRATCH files should be available for 21 days, it is good practice to copy your Trinity output file back to your home directory before the job ends. Copy it back to your home directory with

    cp my-trinity-output.out $HOME