next up previous contents
Next: Optional annotation Up: SELEX alignment format Previous: Example of a

Detailed specification of a SELEX file

  1. Any line beginning with a #= as the first two characters is a machine comment. #= comments are reserved for parsed data about the alignment. Usually these features are maintained by software such as the koala editor, not by hand. The format of #= lines is usually quite specific, since they must be parsed by the file-reading software.

  2. All other lines beginning with a % or # as the first character are user comments. User comments are ignored by all software. Anything may appear on these lines. Any number of comments may be included in a SELEX file, and at any point.

  3. Lines of data consist of a name followed by a sequence. The total length of the line must be smaller than 1024 characters.

  4. Names must be a single word. Any non-whitespace characters are accepted. No spaces are tolerated in names: names MUST be a single word. Names must be less than 32 characters long.

  5. In the sequence, any of the characters -_. or a space are recognized as gaps. Gaps will be converted to a '.'. Any other characters are interpreted as sequence. Sequence is case-sensitive. There is a common assumption by my software that upper-case symbols are used for consensus (match) positions and lower-case symbols are used for inserts. This language of ``match'' versus ``insert'' comes from the hidden Markov model formalism [10]. To almost all of my software, this isn't important, and it immediately converts the sequence to all upper-case after it's read.

  6. Multiple different sequences are grouped in a block of data lines. Blocks are separated by blank lines. No blank lines are tolerated between the sequence lines in a block. Each block in a multi-block file of a long alignment must have its sequences in the same order in each block. The names are checked to verify that this is the case; if not, only a warning is generated. (In manually constructed files, some users may wish to use shorthand names in subsequent blocks after an initial block with full names -- but this isn't recommended.)


next up previous contents
Next: Optional annotation Up: SELEX alignment format Previous: Example of a



Sean Eddy
Mon Apr 17 09:54:19 CDT 1995