MEME Usage

MEME is copyright 1994, The Regents of the University of California, Author: Timothy L. Bailey

Usage:

meme dataset [optional switches]

Required Inputs:

dataset
The set of sequences in Pearson/FASTA format. If the name ``stdin'' is given, MEME reads from standard input. Sequences may be in capital or lowercase or both.

Optional Inputs:

(Note: n is an integer, a is a decimal number, string is a string of characters)

Alphabet:

The default alphabet is the IUPAC protein alphabet.
-protein
Use the standard IUPAC protein alphabet:
ACDEFGHIKLMNPQRSTVWY

Conversions:

    X --> C (unknown to cysteine)
    Z --> E (Glu, Gln to Glu)
    B --> D (Asp, Asn to Asp)
-dna
Use the standard DNA alphabet
{ACGT}

Conversions:

    X --> C (unknown to cytosine)
    N --> C (unknown to cytosine)
-alph string
Use string as the alphabet. {string} should contain all the characters used in the sequences in dataset. MEME does NOT understand characters that stand for several different characters in the sequence alphabet.

Sequence Model:

The default sequence model type is oops. The default type of prior distribution on the parameters of the model depends on the type of sequence model chosen. The default strength of the prior depends on the type of prior chosen.
-mod string
The type of sequence model to use:
oops
One Occurrence Per Sequence
MEME assumes that each sequence in the dataset contains exactly one occurrence of each motif. This is the fastest.
zoops
Zero or One Occurrence Per Sequence
MEME assumes that each sequence contains at most one occurrence of each motif. Slower than oops. Somewhat prone to convex combinations.
tcm
Two-Component Mixture
MEME assumes each sequence contains any number of non-overlapping occurrences of each motif. Much slower than oops. Very prone to convex combinations.
-prior string
The prior distribution on the model parameters:
dirichlet
simple Dirichlet prior
This is the default for -dna and -alph.
dmix
mixture of Dirichlets prior
This is the default for -protein with -mod oops.
mega
extremely low variance dmix;
variance is scaled inversely with the size of the dataset. This is the default for -protein with -mod tcm.
megap
megadmix for all but last iteration of EM;
dmix on last iteration. This is the default for -protein with -mod zoops.
addone
+1 to each observed count
-b a
The strength of the prior on model parameters:
a = 0 means use intrinsic strength of prior for prior = dmix.
Defaults:
1 if prior = dirichlet
0 if prior = dmix

Motif:

-w n
-minw n
-maxw n
-noshorten
The width of the motif(s) to search for. If -w is given, only that width is tried. Otherwise, widths between -minw and -maxw are tried and the most statistically significant one is chosen for the motif. MEME tries to shorten motifs unless -noshorten given.
Default: -minw 8, -maxw MAXSITE (defined in user.h)

Note: If -maxw n is equal to or greater than the length of the shortest sequence in the dataset, n is reset by MEME to that value minus 1.

-nsites n
-minsites n
-maxsites n
The (expected) number of occurrences of each motif. If -nsites is given, only that number of occurrences is tried. Otherwise, numbers of occurrences between -minsites and -maxsites are tried as initial guesses for the number of motif occurrences. These switches are ignored if mod = oops.
Default:
-minsites
sqrt(number sequences)
-maxsites Default:
zoops
1/(L-2+1)
tcm
(size of dataset)/(2w)
-pal
Look for palindromes in DNA datasets. If -pal is specified, MEME automatically decides if a motif appears to be a DNA palindrome or not. If -pal is not given, MEME does not search for DNA palindromes.

Multiple Motifs:

-nmotifs n
The number of *different* motifs to search for. MEME will search for and output n motifs.
Default: 1
-chi a
Quit looking form motifs if objective function falls below a.
Default: 1
(so MEME never quits before -nmotifs n have been found.)

EM Algorithm:

-maxiter n
The number of iterations of EM to run from any starting point. EM is run for n iterations or until convergence (see -distance, below) from each starting point.
Default: 50
-distance a
The convergence criterion. MEME stops iterating EM when the change in the motif frequency matrix is less than a. (Change is the euclidean distance between two successive frequency matrices.)
Default: 0.001
-adj string
The type of adjustment made to the LRT-based objective function used by MEME.
none
use significance level of LRT
bon
use Bonferroni-like adjustment
root
use n-th root of LRT sig. level (default)
Adjustments are listed in order of favoring *shorter* motif widths. root adjustment favors shortest widths.

Selecting Starts For EM:

The default is for MEME to search the dataset for good starts for EM. How the starting points are derived from the dataset is specified by the following switches. The default type of mapping MEME uses is:
    -spmap uni for -dna and -alph string
    -spmap pam for -protein
-spfuzz a
The fuzziness of the mapping. Possible values are greater than 0. Meaning depends on -spmap, see below.
-spmap string
The type of mapping function to use.
uni
Use add-a prior when converting a substring to an estimate of theta.
Default -spfuzz a: 0.5
pam
Use columns of PAM a matrix when converting a substring to an estimate of theta.
Default -spfuzz a: 120 (PAM 120)

Other types of starting points can be specified using the following switches.

-cons string
Override the sampling of starting points and just use a starting point derived from string. This is useful when an actual occurence of a motif is known and can be used as the starting point for finding the motif.