Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing,
communications and data analytics.

Bowtie

Bowtie is an ultrafast, memory-efficient short read aligner.

Installed on blacklight, biou

Other resources that may be helpful include:

Setting up Bowtie

On blacklight:

The Bowtie program is made availiable for use through the module command. To load the Bowtie module enter:

module load bowtie

On biou:

The bowtie programs are availiable through the Galaxy instance on biou.

To make the bowtie programs availiable through the command line, csh users should enter the following command:

source /packages/bin/SETUP_BIO_SOFTWARE

To make the bowtie programs availiable through the command line, bash users should enter the following command:

source /packages/bin/SETUP_BIO_SOFTWARE

 

General Usage

 

Running bowtie is generally a two-step process. First build the bowtie index with the bowtie-build command. Then, map the reads to the bowtie index.

Running bowtie-build

bowtie-build [options]* <reference_in> <ebwt_outfile_base>

where:

reference_in comma-separated list of files with ref sequences
ebwt_outfile_base write Ebwt data to files with this dir/basename

 

Bowtie-build options
-f reference files are Fasta (default)
-c reference sequences given on cmd line (as <seq_in>)
-C
--color
build a colorspace index
-a
--noauto
disable automatic memory-fitting for these options: -p, --bmax, --dcv
-p
--packed
use packed strings internally; slower, uses less mem
--bmax <int> max bucket sz for blockwise suffix-array builder
--bmaxdivn <int> max bucket sz as divisor of ref len (default: 4)
--dcv <int> diff-cover period for blockwise (default: 1024)
--nodc disable diff-cover (algorithm becomes quadratic)
-r
--noref
don't build .3/.4.ebwt (packed reference) portion
-3
--justref
just build .3/.4.ebwt (packed reference) portion
-o
--offrate
<int> SA is sampled every 2^offRate BWT chars (default: 5)
-t
--ftabchars
<int> # of chars consumed in initial lookup (default: 10)
--ntoa convert Ns in reference to As
--seed <int> seed for random number generator
-q
--quiet
verbose output (for debugging)
-h
--help
print detailed description of tool and its options
--usage print this usage message
--version print version information and quit

 

Running bowtie

bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]

where:

<m1> Comma-separated list of files containing upstream mates (or the sequences themselves, if -c is set) paired with mates in <m2>
<m2> Comma-separated list of files containing downstream mates (or the sequences themselves if -c is set) paired with mates in <m1>
<r> Comma-separated list of files containing Crossbow-style reads. Can be a mixture of paired and unpaired. Specify "-" for stdin.
<s> Comma-separated list of files containing unpaired reads, or the sequences themselves, if -c is set. Specify "-" for stdin.
<hit> File to write hits to (default: stdout)

 

Bowtie options
Input options
-c query sequences given on cmd line (as <mates>, <singles>)
-C reads and index are in colorspace
-f query input files are (multi-)FASTA .fa/.mfa
-q query input files are FASTQ .fq/.fastq (default)
-Q
--quals <file>
QV file(s) corresponding to CSFASTA inputs; use with -f -C
--Q1
--Q2 <file>
same as -Q, but for mate files 1 and 2 respectively
-r query input files are raw one-sequence-per-line
-s
--skip <int>
skip the first <int> reads/pairs in the input
-u
--qupto <int>
stop after first <int> reads/pairs (excl. skipped reads)
-5
--trim5 <int>
trim <int> bases from 5' (left) end of reads
-3
--trim3 <int>
trim <int> bases from 3' (right) end of reads
--phred33-quals input quals are Phred+33 (default)
--phred64-quals input quals are Phred+64 (same as --solexa1.3-quals)
--solexa-quals input quals are from GA Pipeline ver. < 1.3
--solexa1.3-quals input quals are from GA pipleline ver. >= 1.3
--integer-quals qualities are given as space-separated integers (not ASCII)
Alignment options
-v <int> report end-to-end hits w/ <=v mismatches; ignore qualities or
-n
--seedmms <int>
max mismatches in seed (can be 0-3, default: -n 2)
-e
--maqerr <int>
max sum of mismatch quals across alignment for -n (def: 70)
-l
--seedlen <int>
seed length for -n (default: 28)
--nomaqround disable Maq-like quality rounding for -n (nearest 10 <= 30)
-I
--minins <int>
minimum insert size for paired-end alignment (default: 0)
-X
--maxins <int>
maximum insert size for paired-end alignment (default: 250)
--fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (default: --fr)
--nofw/--norc do not align to forward/reverse-complement reference strand
--maxbts <int> max # backtracks for -n 2/3 (default: 125, 800 for --best)
--pairtries <int> max # attempts to find mate for anchor hit (default: 100)
-y
--tryhard
try hard to find valid alignments, at the expense of speed
--chunkmbs <int> max megabytes of RAM for best-first search frames (def: 64)
Reporting options:
-k <int> report up to <int> good alignments per read (default: 1)
-a
--all
report all alignments per read (much slower than low -k)
-m <int> suppress all alignments if > <int> exist (def: no limit)
-M <int> like -m, but reports 1 random hit (MAPQ=0); requires --best
--best hits guaranteed best stratum; ties broken by quality
--strata hits in sub-optimal strata aren't reported; requires --best
Output options:
-t
--time
print wall-clock time taken by search phases
-B
--offbase <int>
leftmost ref offset = <int> in bowtie output (default: 0)
--quiet print nothing but the alignments
--refout write alignments to files refXXXXX.map, 1 map per reference
--refidx refer to ref. seqs by 0-based index rather than name
--al <fname> write aligned reads/pairs to file(s) <fname>
--un <fname> write unaligned reads/pairs to file(s) <fname>
--max <fname> write reads/pairs over -m limit to file(s) <fname>
--suppress <cols> suppresses given columns (comma-delim'ed) in default output
--fullref write entire ref name (default: only up to 1st space)
Colorspace options:
--snpphred <int> Phred penalty for SNP when decoding colorspace (def: 30)
--snpfrac <dec> approx. fraction of SNP bases (e.g. 0.001); sets --snpphred
--col-cseq print aligned colorspace seqs as colors, not decoded bases
--col-cqual print original colorspace quals, not decoded quals
--col-keepends keep nucleotides at extreme ends of decoded alignment
SAM options:
-S
--sam
write hits in SAM format
--mapq <int> default mapping quality (MAPQ) to print for SAM alignments
--sam-nohead supppress header lines (starting with @) for SAM output
--sam-nosq supppress @SQ header lines for SAM output
--sam-RG <text> add <text> (usually "lab=value") to @RG line of SAM header
Performance options:
-o
--offrate <int>
override offrate of index; must be >= index's offrate
-p
--threads <int>
number of alignment threads to launch (default: 1)
--mm use memory-mapped I/O for index; many 'bowtie's can share
--shmem use shared mem for index; many 'bowtie's can share
Other options:
--seed <int> seed for random number generator
--verbose verbose output (for debugging)
--version print version information and quit
-h
--help
print this usage message

 

Example PBS script (blacklight):

#!/bin/bash
#PBS -q batch
#PBS -j oe
#PBS -l ncpus=16
#PBS -l walltime=24:00:00
#PBS -N Bowtie
#
# ---------------
# Bowtie Setup
# ---------------
source /usr/share/modules/init/bash
module load bowtie/1.0.0
module load samtools/0.1.18
THREADS=16
#
set -x cd $SCRATCH
#---------------------------------------------------------
# WFILE1 and WFILE2 should point to your fastq read files
# REFFILE should point to the reference file to be indexed
#---------------------------------------------------------
WFILE1=SRR189815_1.fastq
WFILE2=SRR189815_2.fastq
REFFILE=human_g1k_v37.fasta
# ---------------
# Build Bowtie Index
# ---------------
bowtie-build -f $IDIR/$REFFILE Bowtieidx
#
# ---------------
# RUN Bowtie
# ---------------
bowtie -p $THREADS -X 1000 -q --phred33-quals --fr --chunkmbs 1024 \
    --best -S -t Bowtieidx -1 $WFILE1 -2 $WFILE2 Bowtie.sam