Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing, communications and informatics.

GlimmerHMM

 

 

GlimmerHMM is a Eukaryotic Gene-Finding System based on a Generalized Hidden Markov Model (GHMM).

Installed on blacklight, biou.

Other resources that may be helpful include:

The GlimmerHMM web site (http://ccb.jhu.edu/software/glimmerhmm/)

Running GlimmerHMM

1) Make GlimmerHMM commands availiable for use
a) blacklight:
The GlimmerHMM programs will be made availiable for use through the module command. To load the GlimmerHMM module enter:

module load glimmerhmm

b) biou:

The GlimmerHMM programs are availiable through the Galaxy instance on biou. To make the GlimmerHMM programs availiable through the command line, csh users should enter the following command:

% source /packages/bin/SETUP_BIO_SOFTWARE

To make the GlimmerHMM programs availiable through the command line, bash users should enter the following command:

% source /packages/bin/SETUP_BIO_SOFTWARE

2) GlimmerHMM Command line usage:

glimmerhmm <genome1-file> <training-dir-for-genome1> [options]

Options:

-p file_name If protein domain searches are available, read them from file file_name
-d dir_name Training directory is specified by dir_name (introduced for compatibility with earlier versions)
-o file_name Print output in file_name; if n>1 for top best predictions, output is in file_name.1, file_name.2, ... , file_name.n f
-n n Print top n best predictions
-g Print output in gff format
-v Don't use svm splice site predictions
-f Don't make partial gene predictions
-h Display the options of the program

3) Training datasets

To use glimmer, you must either train the program or use a precompiled training set.

a) Using pre-compiled training datasets.

A number of precompiled training sets are included in the GlimmerHMM release. The precompiled sets include:

arabidopsis
Celegans
human
rice
zebrafish

If your genome is listed above (or is a close relative of a genome listed above), you may use the pre-compiled training sets, with the -d option followed by the directory containing the pre-compiled training set. The precompiled training sets can be found in the directory:

$GLIMMERHMM_HOME/trained_dir

For example to use the precompiled set for the human genome on a set of sequences contained in the file fasta.file, you would use the following on the command line:

% glimmerhmm fasta.file -d $GLIMMERHMM_HOME/trained_dir/human

b) compiling your own training dataset

Train GlimmerHMM module.

To train, use the command trainGlimmerHMM with the parameters as specified below.

trainGlimmerHMM <mfasta_file> <exon_file> [optional_parameters]

<mfasta_file> is a multifasta file containing the sequences for training with the usual format:

        >seq1
        AGTCGTCGCTAGCTAGCTAGCATCGAGTCTTTTCGATCGAGGACTAGACTT
        CTAGCTAGCTAGCATAGCATACGAGCATATCGGTCATGAGACTGATTGGGC
        >seq2
        TTTAGCTAGCTAGCATAGCATACGAGCATATCGGTAGACTGATTGGGTTTA
        TGCGTTA

<exon_file> is a file with the exon coordinates relative to the sequences contained in the <mfasta_file>; different genes are separated by a blank line; I am assuming a format like below:

        seq1 5 15
        seq1 20 34
        seq1 50 48
        seq1 45 36
        seq2 17 20

In this example seq1 has two genes: one on the direct strand and another one on the complementary strand

[optional_parameters]

-i i1,i2,...,in isochores to be considered (e.g. if two isochores are desired between 0-40% GC content and 40-100% then the option should be: -i 0,40,100; default is -i 0,100 )
-f val val = average value of upstream UTR region if known
-l val val = average value of downstream UTR region if known
-n val val = average value of intergenic region if known

After running trainGlimmerHMM, a directory will be created in the directory where you ran the traininning procedure from. This directory will be called TrainGlimmM[data][time] where [data] and [time] specify the data and the time when the directory was created. This directory contains the training parameters needed by GlimmerHMM to run. A log file named after the name of the diretory will be also created specifying some of the default parameters set for GlimmerHMM. Once your training is complete, run GlimmerHMM with your training set.

4) PBS example files are availiable for running GlimmerHMM on blacklight

Stay Connected

Stay Connected with PSC!

facebook 32 twitter 32 google-Plus-icon