Genbank File Format

File Header
  • The first line in the file must have "GENETIC SEQUENCE DATA BANK" in spaces 20 through 46.

  • The next 8 lines may contain arbitrary text. They are ignored but are required to maintain the GenBank format.

Sequence Data Entries

  1. Each sequence entry in the file should have the following format:
    first line
    Must have LOCUS in the first 5 spaces. The genetic locus name or identifier must be in spaces 13 - 22. The length of the sequences is right justified in spaces 23 through 29.
    second line
    Must have DEFINITION in the first 10 spaces. Spaces 13 - 80 are free form text to identify the sequence.
    third line
    Must have ACCESSION in the first 9 spaces. Spaces 13 - 18 must hold the primary accession number.
    fourth line
    Must have ORIGIN in the first 6 spaces. Nothing else is required on this line, it indicates that the nucleic acid sequence begins on the next line.
    fifth line
    Begins the nucleotide sequence. The first 9 spaces of each sequence line may either be blank or may contain the position in the sequence of the first nucleotide on the line. The next 66 spaces hold the nucleotide sequence in six blocks of ten nucleotides. Each of the six blocks begins with a blank space followed by ten nucleotides. Thus the first nucleotide is in space eleven of the line while the last is in space 75.
    last line
    Must have // in the first 2 spaces to indicate termination of the sequence.

  2. NOTE: Multiple sequences may appear in each file. To begin another sequence go back to a) and start again.