Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing,
communications and data analytics.

Glimmer3

Glimmer3 is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses.

Installed on blacklight, biou.

Other resources that may be helpful include:

Running Glimmer3

On blacklight

The Glimmer3 programs are made available for use through the module command. To load the Glimmer3 module enter:

module load glimmer

On biou

The Glimmer3 programs are available through the Galaxy instance on biou.

To make the Glimmer3 programs available through the command line, csh users should enter the following command:

source /packages/bin/SETUP_BIO_SOFTWARE

To make the Glimmer3 programs available through the command line, bash users should enter the following command:

source /packages/bin/SETUP_BIO_SOFTWARE

Command line usage

glimmer3 [options] <sequence-file> <icm-file> <tag>

Read DNA sequences in and predict genes in them using the Interpolated Context Model in <icm-file>. Output details go to file <tag>.detail and predictions go to file <tag>.predict

Options:

-A
--start_codons <codon-list>
Use comma-separated list of codons as start codons Sample format: -A atg,gtg Use -P option to specify relative proportions of use. If -P not used, then proportions will be equal
-b <filename>
--rbs_pwm <filename>
Read a position weight matrix (PWM) from <filename> to identify the ribosome binding site to help choose start sites
-C <p>
--gc_percent <p>
Use <p> as GC percentage of independent model Note: <p> should be a percentage, e.g., -C 45.2
-E <filename>
--entropy <filename>
Read entropy profiles from <filename>. Format is one header line, then 20 lines of 3 columns each. Columns are amino acid, positive entropy, negative entropy. Rows must be in order by amino acid code letter
-f
--first_codon
Use first codon in orf as start codon
-g <n>
--gene_len <n>
Set minimum gene length to <n>
-h
--help
Print this message
-i <filename>
--ignore <filename>
<filename> specifies regions of bases that are off limits, so that no bases within that area will be examined
-l
--linear
Assume linear rather than circular genome, i.e., no wraparound
-L <filename>
--orf_coords <filename>
Use <filename> to specify a list of orfs that should be scored separately, with no overlap rules
-M
--separate_genes
<sequence-file> is a multifasta file of separate genes to be scored separately, with no overlap rules
-o <n>
--max_olap <n>
Set maximum overlap length to <n>. Overlaps this short or shorter are ignored.
-P <number-list>
--start_probs <number-list>
Specify probability of different start codons (same number & order as in -A option). If no -A option, then 3 values for atg, gtg and ttg in that order. Sample format: -P 0.6,0.35,0.05 If -A is specified without -P, then starts are equally likely.
-q <n>
--ignore_score_len <n>
Do not use the initial score filter on any gene <n> or more base long
-r
--no_indep
Don't use independent probability score column
-t <n>
--threshold <n>
Set threshold score for calling as gene to n. If the in-frame score >= <n>, then the region is given a number and considered a potential gene.
-X
--extend
Allow orfs extending off ends of sequence to be scored
-z <n>
--trans_table <n>
Use Genbank translation table number <n> for stop codons
-Z <codon-list>
--stop_codons <codon-list>
Use comma-separated list of codons as stop codons Sample format: -Z tag,tga,taa