Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. Although Bowtie and Bowtie2 are both fast read aligners, there are few main differences between them:
- Bowtie2 supports gapped alignment with affine gap penalties, without restrictions on the number of gaps and gap lengths.
- Bowtie supports reads longer than 50bp and is generally faster, more sensitive, and uses less memory than Bowtie.
- Bowtie support only end-to-end alignments, while Bowtie2 supports both end-to-end and local alignment.
- Bowtie has an upper limit on read length of around 1,000 bp, while Bowtie2 does not have any.
- Bowtie2's paired-end alignment is more flexible that Bowtie's.
- Bowtie2 does not align colorspace reads.
- Bowtie and Bowtie2 indices are not compatible.
Same as Bowtie, the first and basic step of running Bowtie2 is to build a Bowtie2 index from a reference genome sequence. The basic usage of the command bowtie2-build is:
where input_reference.fasta is an input file of sequence reads in fasta format, and index_prefix is the prefix of the generated index files. Beside the option -f that is used when the reference input file is a fasta file, the option -c can be used when the reference sequences are given on the command line.
The command bowtie2 takes a Bowtie2 index and set of sequencing read files and outputs set of alignments in SAM format. The general bowtie2 usage is:
where index_prefix is the generated index using the bowtie2-build command, and options are optional parameters that can be found in the Bowtie2 manual: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml. Bowtie2 supports both single-end (input_reads.[fasta|fastq]) and paired-end (input_reads_pair_1.[fasta|fastq], input_reads_pair_2.[fasta|fastq]) files in fasta or fastq format. The format of the input files also needs to be specified by using one of the following flags: -q (fastq files), --qseq (Illumina's qseq format), -f (fasta files), -r (raw one sequence per line), or -c (sequences given on command line).
An example of how to run Bowtie2 local alignment on Tusker with paired-end fasta files and 8 CPUs is shown below:
Bowtie2 outputs alignments in SAM format that can further be manipulated with different tools, like SAMtools and GATK. Each line from the file describes an alignment and is a collection of at least 12 fields separated by tabs. Detailed information about Bowtie2 output fields can be found in the Bowtie2 manual.