GenBank Database
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at the National Center for Biotechnology Information. These three organizations exchange data on a daily basis.
Each GenBank entry includes a concise description of the sequence, the scientific name and taxonomy of the source organism, and a table of features that identifies coding regions and other sites of biological significance, such as transcription units, sites of mutations or modifications, and repeats. Protein translations for coding regions are included in the feature table. Bibliographic references are included along with a link to the Medline unique identifier for all published sequences.
Most sequence analysis programs on PSC supercomputers are capable of reading in GenBank data in the GenBank flat file format. The location of the data in the flat file format is built into the MAKSEQ program. However, if you find it neccesary to view the GenBank files in the flat file format, they can be found in the directory /biomed/db/genbank
Accessible from: the opteron cluster.
Using the GenBank Database
- Searching the database with a query sequence
- There are a number of programs that can be used to search the GenBank database with a query sequence. See the list of database searching software.
- Retrieving a database entry
- More information in the file /biomed/db/genbank/README.genbank.
- Release notes in /biomed/db/genbank/gbrel.txt.
- NCBI Home page
- GenBank supplier
Version
The current release numbers can be found at the top of every GenBank flat file.