GlimmerHMM is a Eukaryotic Gene-Finding System based on a Generalized Hidden Markov Model (GHMM).
To see what versions of GlimmerHMM are available type
module avail glimmerhmm
To see what other modules are needed, what commands are available and how to get additional help type
module help glimmerhmm
To use GlimmerHMM, include a command like this in your batch script or interactive session to load the glimmerhmm module:
module load glimmerhmm
Be sure you also load any other modules needed, as listed by the
module help glimmerhmm command.
Command line usage
glimmerhmm <genome1-file> <training-dir-for-genome1> [options]
|-p file_name||If protein domain searches are available, read them from file file_name|
|-d dir_name||Training directory is specified by dir_name (introduced for compatibility with earlier versions)|
|-o file_name||Print output in file_name; if n>1 for top best predictions, output is in file_name.1, file_name.2, … , file_name.n f|
|-n n||Print top n best predictions|
|-g||Print output in gff format|
|-v||Don’t use svm splice site predictions|
|-f||Don’t make partial gene predictions|
|-h||Display the options of the program|
To use glimmer, you must either train the program or use a precompiled training set.
Using pre-compiled training datasets
A number of precompiled training sets are included in the GlimmerHMM release. To see what is available, type:
module load glimmerhmm
If your genome is listed above (or is a close relative of a genome listed above), you may use the pre-compiled training sets, with the-d option followed by the directory containing the pre-compiled training set. The precompiled training sets can be found in the directory $GLIMMERHMM_HOME/trained_dir
For example to use the precompiled set for the human genome on a set of sequences contained in the file fasta.file, you would use the following on the command line:
% glimmerhmm fasta.file -d $GLIMMERHMM_HOME/trained_dir/human
Compiling your own training dataset
Use the trainGlimmerHMM module.
To train, use the commandtrainGlimmerHMM with the parameters as specified below.
trainGlimmerHMM <mfasta_file> <exon_file> [optional_parameters]
<mfasta_file> is a multifasta file containing the sequences for training with the usual format:
>seq1 AGTCGTCGCTAGCTAGCTAGCATCGAGTCTTTTCGATCGAGGACTAGACTT CTAGCTAGCTAGCATAGCATACGAGCATATCGGTCATGAGACTGATTGGGC >seq2 TTTAGCTAGCTAGCATAGCATACGAGCATATCGGTAGACTGATTGGGTTTA TGCGTTA
<exon_file> is a file with the exon coordinates relative to the sequences contained in the <mfasta_file>; different genes are separated by a blank line; I am assuming a format like below:
seq1 5 15 seq1 20 34 seq1 50 48 seq1 45 36 seq2 17 20
In this example seq1 has two genes: one on the direct strand and another one on the complementary strand
|-i i1,i2,…,||in isochores to be considered (e.g. if two isochores are desired between 0-40% GC content and 40-100% then the option should be: -i 0,40,100; default is -i 0,100 )|
|-f val||val = average value of upstream UTR region if known|
|-l val||val = average value of downstream UTR region if known|
|-n val||val = average value of intergenic region if known|
After running trainGlimmerHMM, a directory will be created in the directory where you ran the training procedure from. This directory will be called TrainGlimmM[data][time] where [data] and [time] specify the data and the time when the directory was created. This directory contains the training parameters needed by GlimmerHMM to run. A log file named after the name of the diretory will be also created specifying some of the default parameters set for GlimmerHMM. Once your training is complete, run GlimmerHMM with your training set.