Skip to end of metadata
Go to start of metadata


Bowtie ( is an ultrafast and memory-efficient aligner for large sets of sequencing reads to a reference genome. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small. Bowtie also supports usage of multiple processors to achieve greater alignment speed.

The first and basic step of running Bowtie is to build and format an index from the reference genome. The basic usage of this command bowtie-build is: 

General Bowtie-Build Usage
bowtie-build input_reference.fasta index_prefix

where input_reference.fasta is an input file of sequence reads in fasta format, and index_prefix is the prefix of the generated index files.

After the index of the reference genome is generated, the next step is to align the reads. The basic usage of bowtie is:

General Bowtie Usage
bowtie [-q|-f|-r|-c] index_prefix [-1 input_reads_pair_1.[fasta|fastq] -2 input_reads_pair_2.[fasta|fastq] | input_reads.[fasta|fastq]] [options]

where index_prefix is the generated index using the bowtie-build command, and options are optional parameters that can be found in the Bowtie manual: Bowtie supports both single-end (input_reads.[fasta|fastq]) and paired-end (input_reads_pair_1.[fasta|fastq]input_reads_pair_2.[fasta|fastq]) files in fasta or fastq format. The format of the input files also needs to be specified by using the following flags: -q (fastq files), -f (fasta files), -r(raw one-sequence per line), or -c (sequences given on command line).

An example of how to run Bowtie alignment on Tusker with single-end fastq file and 8 CPUs is shown below:


#SBATCH --job-name=Bowtie
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=50gb
#SBATCH --output=Bowtie.%J.out
#SBATCH --error=Bowtie.%J.err


module load bowtie/1.1

bowtie -q index_prefix input_reads.fastq -p $SLURM_NTASKS_PER_NODE > bowtie_alignments.sam


Bowtie Output

Bowtie output is an alignment file in SAM format, where one line is one alignment. Each line is a collection of 8 fields separated by tabs. The fields are: name of the aligned reads, reference strand aligned to, name of reference sequence where the alignment occurs, 0-based offset into the forward reference strand where leftmost character of the alignment occurs, read sequence, read qualities, the number of other instances where the same sequence is aligned against the same reference characters, and comma-separated list of mismatch descriptors.

  • No labels