Trinity, developed at the Broad Institute, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.
- The Trinity web site for information on Trinity command line options
- FAQ for Trinity on blacklight
- User-developed instructions for for building your own copy of Trinity
To use Trinity sucessfully on Greenfield, first choose the version you will use, then construct your job script.
Choose which version of Trinity to use
Multiple versions of Trinity are installed. It is important to know which version you are using, as the command line options and the default settings can change between versions. It is also a good practice to use the same version throughout your project.
There is always a default version of Trinity defined, but it changes as new versions are added and older versions deleted. For this reason, you should never just load the default version; it may change without notice. Always load the specific version that you want.
The module command can tell you what versions are installed. When you have chosen a version, you will use the
module load command to set up the correct environment to run that version of Trinity. For more information, see documentation on the module command.
For help on selecting a version:
- See what versions are available
To see what versions of Trinity are available, type
module available trinity
- See the options for a specific version
You can see the available options and defaults for a given version. First load the specific Trinity module you are interested in with the command
module load specific-trinity-version
Type in the full name of the module from the
module available trinitycommand. For example, to see the options for version r2.1.1, type
module load trinity/2.1.1
To see the Trinity command line options for this version, now type
- Read the release notes for a specific version
You can read the release notes for any version by looking at the Release.Notes file, found in the top level directory (/usr/local/packages/trinity/version) for that version. For example, to see the release notes for version 2.1.1, look at the file /usr/local/packages/trinity/2.1.1/Release.Notes.
Construct your job script
Include commands like the following in your job script.
- Combine stdout and stderr
In a batch job, the messages and errors that are normally displayed on the monitor while an interactive job runs are instead written to two files, stdout and stderr, respectively.
Redirect stdout and stderr by using the PBS directive -j oe. This combines both stdout and stderr into one file, which makes debugging easier. Put this line into your batch script:
#PBS -j oe my-PBS-output-file
For more information on PBS directives in batch jobs, see the blacklight document.
- Load the appropriate Trinity version and (optionally) associated modules
module loadcommand to define the correct environment to run a specific version of Trinity. Type in the full name of the module from the
module available trinitycommand.
Some versions of Trinity will also load the modules for other software typically needed for a Trinity job; for other versions, you will need to load modules for associated software yourself. Before constructing your job script, type
module help specific-trinity-version
to see which is the case for your chosen version.
- Set the stack size to unlimited
You must set the stack size to unlimited in your batch script before running Trinity, or the job will fail.
If you are using bash, type
ulimit -s unlimited
If you are using csh, type
limit stacksize unlimited
- Copy your input files to $SCRATCH
Your $SCRATCH directory on blacklight is intended to be used as working space for your running jobs. All of the files that your job needs should be copied to $SCRATCH. Copy them with
cp inputfile $SCRATCH
- Move to your $SCRATCH directory
Move to your $SCRATCH before starting Trinity with
- Run Trinity
Some typical command lines are given below. We recommend that you look at the complete list of options available, given at http://trinityrnaseq.sourceforge.net.
It is important that the output produced by Trinity be redirected into a file. By default, Trinity output is written to stdout. This can cause trouble on blacklight, because stdout and stderr files are limited to 20 Mbytes each. If either file exceeds this limit, the job will be killed.
The Trinity package writes a lot of information to stdout and stderr and often exceeds these limits. To prevent your job from being killed by the system, you should redirect Trinity output to a different file. To redirect your Trinity output, use the ">" operator. Here is a command line showing Trinity output redirected to a file called my-trinity-output.out.
Trinity command-line-options > my-trinity-output.out
Typical Trinity command lines
Variables in brackets should be replaced with the desired options or names of your input files. Do not include the brackets themselves in the command line.
Strand Specific Sequencing (Preferred Library Method typical of the dUTP/UDG sequencing method)
Please note that other methods of Strand Specific library generation may require FR orientation. See the Trinity website for a full explanation.
Trinity --seqtype fq --JM 100G --left <yourreads1.fq> --right <yourreads2.fq> --output <dirnameforoutput> --SS_lib_type RF --min_contig_length <contiglengthmincutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > my_trinity_output.out
Non-Strand Specific Library
Trinity --seqtype fq --JM 100G --left <yourreads1.fq> --right <yourreads2.fq> --output <dirnameforoutput> --min_contig_length <contiglengthmincutoff> --CPU 16 --bflyCPU 16 --bflyGCThreads 16 > my_trinity_output.out
- Copy your Trinity output file from $SCRATCH
Although $SCRATCH files should be available for 21 days, it is good practice to copy your Trinity output file back to your home directory before the job ends. Copy it back to your home directory with
cp my-trinity-output.out $HOME