# Example selex file seq1 ACGACGACGACG. seq2 ..GGGAAAGG.GA seq3 UUU..AAAUUU.A seq1 ..ACG seq2 AAGGG seq3 AA...UUU
SELEX is an interleaved multiple alignment format that arose as an intuitive format, easy to write and manipulate manually with a text editor. It is usually easy to convert other alignment formats into SELEX format (though it can be harder to go the other way, since SELEX is more free-format than other alignment formats). For instance, GCG's MSF format and the output of the CLUSTALV multiple alignment program are similar interleaved formats. Because SELEX evolved to accomodate different user input styles, it is very tolerant of various inconsistencies such as different gap symbols, varying line lengths, etc.
Each line contains a name, followed by the aligned sequence. A space,
dash, underscore, or period denotes a gap. If the alignment is too
long to fit on one line, the alignment is split into multiple blocks,
separated by blank lines. The number of sequences, their order, and
their names must be the same in every block (even if a sequence has no
residues in a given block!) Other blank lines are ignored. You can add
comments to the file on lines starting with a #.
SELEX, by the way, stands for ``Systematic Evolution of Ligands by Exponential Enrichment'' -- it refers to the Tuerk and Gold technology for evolving families of small RNAs for particular functions [18]. SELEX files were what we used to keep track of alignments of these small RNA families.
As the format evolved, more features have been added. To maintain compatibility with past alignment files, the new features are added using a reserved comment style. You don't have to worry about adding these extra information lines. They are generally the province of automated SELEX-generating software, such as my koala sequence alignment editor or the cove and hmm sequence analysis packages. This extra information includes consensus and individual RNA or protein secondary structure, sequence weights, a reference coordinate system for the columns, and database source information including name, accession number, and coordinates (for subsequences extracted from a longer source sequence) See below for details.