ALLPATHS-LG is a whole-genome shotgun assembler that can generate high-quality genome assemblies using short reads (~100bp) such as those produced by the new generation of sequencers. The significant difference between ALLPATHS and traditional assemblers such as Arachne is that ALLPATHS assemblies are not necessarily linear, but instead are presented in the form of a graph. This graph representation retains ambiguities, such as those arising from polymorphism, uncorrected read errors, and unresolved repeats, thereby providing information that has been absent from previous genome assemblies.
ALLPATHS-LG was developed at the Broad Institute.
Installed on blacklight.
Running large assemblies on Blacklight efficiently requires special steps that may vary depending on your particular assembly problem. If you expect your assembly to take more than a few hours, please contact PSC User Services beforehand to discuss your assembly with PSC staff.
Sequence data conversion: Before it can be used by ALLPATHS-LG, you must convert your sequencing data and add metadata to it. Please see this document on how to convert data for how to accomplish this data conversion.
Limits on stdout and stderr files: Blacklight limits stdout and stderr files to 20 Mbytes each. If this limit is exceeded, the job is killed. ALLPATHS-LG writes large amounts of data to stdout and stderr which often exceeds 20 Mbytes. To prevent your job from being killed by the system, redirect stdout and stderr to a $SCRATCH file on the ALLPATHS-LG command line.
When you are ready to run your assembly, follow these steps:
- Prepare a batch job containing commands to
- Set up the module command. Blacklight system modules define environment variables which make your life easier when using different software packages. See documentation for the module command for more information.
- Load the system module for ALLPATHS. First, determine which releases are available by typing
module avail allpathsThen load the appropriate one for your work, for example:
module load allpaths-lg/41370
- Change the allowable stack size. ALLPATHS-LG requires this.
ulimit -s 100000
- Prepare the input data using PrepareAllPathsInputs
The PrepareAllPathsInputs perl script (PrepareAllPathsInputs.pl) can be used to convert input data to ALLPATHS-LG format files. See the ALLPATHS-LG Manual for details on PrepareAllPathsInputs.pl. See the example job script for an example of using PrepareAllPathsInputs.pl.
- Call the RunAllPathsLG module to control the assembly pipeline.
ALLPATHS-LG consists of a series of modules. The module pipeline can be controlled through the RunAllPathsLG module. See the ALLPATHS-LG Manual for details on RunAllPathsLG. See the example job script for an example of using RunAllPathsLG.
- Submit the job with the qsub command.