Sequence file format

Various MEME Suite programs require as input a file containing protein or DNA sequences. These input files must be in FASTA format. The format is very simple. Every entry consists of a sequence identifier (ID), an optional comment (COMMENT), and a sequence (SEQUENCE). The format looks like this:

        >ID COMMENT
        SEQUENCE
      

The special character ">" marks the beginning of a new sequence. The ">" character is followed immediately by the sequence identifier. The rest of that line is occupied by the optional comment. Subsequent lines contain the sequence itself.

Some rules about representing sequences:

Here is an example of three sequences in FASTA format:

      >ICYA_MANSE 
      GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKY
      DGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVN
      LVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVD
      NVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH

      >LACB_BOVIN 
      MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDA
      QSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKI
      DALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALE
      KFDKALKALPMHIRLSFNPTQLEEQCHI

      >BBP_PIEBR 
      NVYHDGACPEVKPVDNFDWSNYHGKWWEVAKYPNSVEKYGKCGWAEYTPE
      GKSVKVSNYHVIHGKEYFIEGTAYPVGDSKIGKIYHKLTYGGVTKENVFN
      VLSTDNKNYIIGYYCKYDEDKKGHQDFVWVLSRSKVLTGEAKTAVENYLI
      GSPVVDSQKLVYSDFSEAACKVN