CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.  Originally developed by Dr. Weizhong Li at Dr. Adam Godzik’s Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute), CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.



Usage on Bridges


To see what versions of CD-HIT are available type

module avail cd-hit

To see what other modules are needed, what commands are available and how to get additional help type

module help  cd-hit

To use CD-HIT, include a command like this in your batch script or interactive session to load the CD-HIT module:

module load cd-hit

Be sure you also load any other modules needed, as listed by the module help cd-hit command.