Next: Optional annotation
Up: SELEX alignment format
Previous: Example of a
-
Any line beginning with a
#= as the first two characters is a
machine comment. #= comments are reserved for parsed data
about the alignment. Usually these features are maintained by software
such as the koala editor, not by hand. The format of #=
lines is usually quite specific, since they must be parsed by the
file-reading software.
-
All other lines beginning with a
% or # as the first
character are user comments. User comments are ignored by all
software. Anything may appear on these lines. Any number of comments
may be included in a SELEX file, and at any point.
-
Lines of data consist of a name followed by a sequence. The total
length of the line must be smaller than 1024 characters.
-
Names must be a single word. Any non-whitespace characters are
accepted. No spaces are tolerated in names: names MUST be a
single word. Names must be less than 32 characters long.
- In the sequence, any of the characters
-_. or a space are
recognized as gaps. Gaps will be converted to a '.'. Any other
characters are interpreted as sequence. Sequence is
case-sensitive. There is a common assumption by my software that
upper-case symbols are used for consensus (match) positions and
lower-case symbols are used for inserts. This language of ``match''
versus ``insert'' comes from the hidden Markov model formalism
[10]. To almost all of my software, this isn't important,
and it immediately converts the sequence to all upper-case after it's
read.
-
Multiple different sequences are grouped in a block of data lines.
Blocks are separated by blank lines. No blank lines are tolerated
between the sequence lines in a block. Each block in a multi-block
file of a long alignment must have its sequences in the same order in
each block. The names are checked to verify that this is the case; if
not, only a warning is generated. (In manually constructed files, some
users may wish to use shorthand names in subsequent blocks after an
initial block with full names -- but this isn't recommended.)
Next: Optional annotation
Up: SELEX alignment format
Previous: Example of a
Sean Eddy
Mon Apr 17 09:54:19 CDT 1995