MEME Usage
MEME is copyright 1994, The Regents of the University of California, Author: Timothy L. Bailey
Usage:
meme dataset [optional switches]
Required Inputs:
- dataset
- The set of sequences in Pearson/FASTA format. If the name ``stdin'' is given, MEME reads from standard input. Sequences may be in capital or lowercase or both.
Optional Inputs:
(Note: n is an integer, a is a decimal number, string is a string of characters)Alphabet:
The default alphabet is the IUPAC protein alphabet.- -protein
- Use the standard IUPAC protein alphabet:
ACDEFGHIKLMNPQRSTVWYConversions:
-
X --> C (unknown to cysteine)
Z --> E (Glu, Gln to Glu)
B --> D (Asp, Asn to Asp) - -dna
- Use the standard DNA alphabet
{ACGT}Conversions:
-
X --> C (unknown to cytosine)
N --> C (unknown to cytosine) - -alph string
- Use string as the alphabet. {string} should contain all the characters used in the sequences in dataset. MEME does NOT understand characters that stand for several different characters in the sequence alphabet.
Sequence Model:
The default sequence model type is oops. The default type of prior distribution on the parameters of the model depends on the type of sequence model chosen. The default strength of the prior depends on the type of prior chosen.- -mod string
- The type of sequence model to use:
- oops
- One Occurrence Per Sequence
MEME assumes that each sequence in the dataset contains exactly one occurrence of each motif. This is the fastest. - zoops
- Zero or One Occurrence Per Sequence
MEME assumes that each sequence contains at most one occurrence of each motif. Slower than oops. Somewhat prone to convex combinations. - tcm
- Two-Component Mixture
MEME assumes each sequence contains any number of non-overlapping occurrences of each motif. Much slower than oops. Very prone to convex combinations.
- -prior string
- The prior distribution on the model parameters:
- dirichlet
- simple Dirichlet prior
This is the default for -dna and -alph. - dmix
- mixture of Dirichlets prior
This is the default for -protein with -mod oops. - mega
- extremely low variance dmix;
variance is scaled inversely with the size of the dataset. This is the default for -protein with -mod tcm. - megap
- megadmix for all but last iteration
of EM;
dmix on last iteration. This is the default for -protein with -mod zoops. - addone
- +1 to each observed count
- -b a
- The strength of the prior on model parameters:
a = 0 means use intrinsic strength of prior for prior = dmix.
Defaults:- 1 if prior = dirichlet
- 0 if prior = dmix
- 1 if prior = dirichlet
Motif:
- -w n
- -minw n
- -maxw n
- -noshorten
- -minw n
- The width of the motif(s) to search for.
If -w is given, only that width is tried.
Otherwise, widths between -minw and
-maxw are tried and the most statistically
significant one is chosen for the motif.
MEME tries to shorten motifs unless -noshorten given.
Default: -minw 8, -maxw MAXSITE (defined in user.h)Note: If -maxw n is equal to or greater than the length of the shortest sequence in the dataset, n is reset by MEME to that value minus 1.
- -nsites n
- -minsites n
- -maxsites n
- -minsites n
- The (expected) number of occurrences of each motif.
If -nsites is given, only that number of occurrences
is tried. Otherwise, numbers of occurrences between
-minsites and -maxsites are tried as initial guesses
for the number of motif occurrences. These
switches are ignored if mod = oops.
Default:- -minsites
- sqrt(number sequences)
- -maxsites Default:
- zoops
- 1/(L-2+1)
- tcm
- (size of dataset)/(2w)
- -pal
- Look for palindromes in DNA datasets. If -pal is specified, MEME automatically decides if a motif appears to be a DNA palindrome or not. If -pal is not given, MEME does not search for DNA palindromes.
Multiple Motifs:
- -nmotifs n
- The number of *different* motifs to search
for. MEME will search for and output n motifs.
Default: 1 - -chi a
- Quit looking form motifs if objective function
falls below a.
Default: 1
(so MEME never quits before -nmotifs n have been found.)
EM Algorithm:
- -maxiter n
- The number of iterations of EM to run from
any starting point.
EM is run for n iterations or until convergence
(see -distance, below) from each starting point.
Default: 50 - -distance a
- The convergence criterion. MEME stops
iterating EM when the change in the
motif frequency matrix is less than a.
(Change is the euclidean distance between
two successive frequency matrices.)
Default: 0.001 - -adj string
- The type of adjustment made to the
LRT-based objective function used by MEME.
- none
- use significance level of LRT
- bon
- use Bonferroni-like adjustment
- root
- use n-th root of LRT sig. level (default)
Selecting Starts For EM:
The default is for MEME to search the dataset for good starts for EM. How the starting points are derived from the dataset is specified by the following switches. The default type of mapping MEME uses is:-
-spmap uni for -dna and -alph string
-spmap pam for -protein
- -spfuzz a
- The fuzziness of the mapping. Possible values are greater than 0. Meaning depends on -spmap, see below.
- -spmap string
- The type of mapping function to use.
- uni
- Use add-a prior when converting a substring
to an estimate of theta.
Default -spfuzz a: 0.5 - pam
- Use columns of PAM a matrix when converting
a substring to an estimate of theta.
Default -spfuzz a: 120 (PAM 120)
- Other types of starting points
can be specified using the following switches.
- -cons string
- Override the sampling of starting points and just use a starting point derived from string. This is useful when an actual occurence of a motif is known and can be used as the starting point for finding the motif.