FAQ for Trinity use on Blacklight
- How many cores should I request? How many should Trinity use? How do I do this?
- Ah nuts, my job failed. Do I have to start all over?
- My job failed with an error something like:
- 1. How many cores should I request for Trinity? How many should I tell Trinity to use? How do I do this?
- There is a distinction between the number of cores you request for your job via the #PBS directive and the number specified in the Trinity command line with the –CPU option.
The number of cores you request with the #PBS directive determines the amount of memory that will be available for your job. On Blacklight, each core has 8 GB of memory associated with it. So if your job requests 128 cores, you will be allocated 1 TB (1024 GB) of memory. Use the #PBS directive to request as many cores as you need to get sufficient memory for your job.
A rough rule of thumb is that you will need 1 GB per million reads.
However, the number of cores that Trinity uses when it runs is determined by the –CPU option on the Trinity command line, regardless of the number requested with the #PBS directive. In the initial stages Trinity uses only 16 or maybe 32 cores productively. (Later stages like QuantifyGraph or Butterfly can use many more.)
So if your job requires a Terabyte of memory, your job script should contain these lines:
#!/bin/bash #PBS -l ncpus=128 . . . Trinity.pl --CPU 16 other-command-line-options . .
- 2. My job failed. Do I have to start over?
- No, you don’t. Just resubmit the same script and Trinity will start where it left off rather than starting from scratch. After Trinity finishes each stage of computation, it writes out a file telling itself that the stage is finished. So, Trinity will backtrack to the stage right after the last “checkpoint” file was written, but it does not need to start all over from the beginning. Granted, some stages can take several days, so you may still lose some time.
- 3. What should I do when my job fails with an error like this:
PBS: job killed: stdout file size 96800KB exceeded limit 20480KB
You must direct the output from Trinity directly to a file on the Trinity command line like this:
Trinity.pl command-line-options >& my-trinity-output.out
If you don’t do this, by default, Trinity output is written to stdout. This can cause trouble on blacklight, because stdout and stderr files are limited to 20 Mbytes each. If either file exceeds this limit, the job will be killed. The Trinity package writes a lot of information to stdout and stderr and often exceeds these limits.
PBS: job killed: pid 26111814 (java) thread count 2573 exceeded limit 2500
When you run Butterfly, you should limit the number of Java garbage collection threads by using this command line option:
These threads are only for Java’s internal memory management, so they should not really affect your performance. Anything between 2 and 16 threads is reasonable. If you do not include the –bflyGCThreads option, Java will launch as many threads as there are cores, and you will certainly exceed the limit.