CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. Originally developed by Dr. Weizhong Li at Dr. Adam Godzik’s Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute), CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.
Usage on Bridges
To see what versions of CD-HIT are available type
module avail cd-hit
To see what other modules are needed, what commands are available and how to get additional help type
module help cd-hit
To use CD-HIT, include a command like this in your batch script or interactive session to load the CD-HIT module:
module load cd-hit
Be sure you also load any other modules needed, as listed by the
module help cd-hit command.