Skip to end of metadata
Go to start of metadata

BWA Index:

The first step of using BWA is to make an index of the reference genome in fasta format. The basic usage of the bwa index is:

General BWA Index Usage
bwa index [-a bwtsw|is] input_reference.fasta index_prefix

where input_reference.fasta is an input file of the reference genome in fasta format, and index_prefix is the prefix of the generated index files. The option -a is required and can have two values: bwtsw (does not work for short genomes) and is (does not work for long genomes). Therefore, this value is chosen according to the length of the genome.

BWA Mem:

The bwa mem algorithm is one of the three algorithms provided by BWA. It performs local alignment and produces alignments for different part of the query sequence.

General BWA Mem Usage
bwa mem index_prefix [input_reads.fastq|input_reads_pair_1.fastq input_reads_pair_2.fastq] [options]

where index_prefix is the index for the reference genome generated from bwa index, and input_reads.fastq, input_reads_pair_1.fastq, input_reads_pair_2.fastq are the input files of sequencing data that can be single-end or paired-end respectively. Additional options for bwa mem can be found in the BWA manual.

Simple SLURM script for running bwa mem on Tusker with paired-end fastq input data, index_prefix as reference genome index, SAM output file and 8 CPUs is shown below:


#SBATCH --job-name=Bwa_Mem
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --time=168:00:00
#SBATCH --mem=50gb
#SBATCH --output=BwaMem.%J.out
#SBATCH --error=BwaMem.%J.err


module load bwa/0.7

bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq -t $SLURM_NTASKS_PER_NODE > bwa_mem_alignments.sam


BWA Bwasw:

The bwa bwasw algorithm is another algorithm provided by BWA. For input files with single-end reads it aligns the query sequences. For input files with paired-ends reads it performs paired-end alignment that only works for Illumina reads. An example of bwa bwasw for single-end input file input-reads.fasta in fasta format and output file bwa_bwasw_alignments.sam where the alignments are stored, is shown below:

General BWA Bwasw Usage
bwa bwasw index_prefix input_reads.fasta -t $SLURM_NTASKS_PER_NODE > bwa_bwasw_alignments.sam

BWA Aln:

The third BWA algorithm, bwa aln, aligns the input file of sequence data to the reference genome. In addition, there is an example of running bwa aln with single-end input_reads.fasta input file and 8 CPUs:

General BWA Aln Usage
bwa aln index_prefix input_reads.fasta -0 -t $SLURM_NTASKS_PER_NODE > bwa_aln_alignments.sai

The command bwa samse uses the bwa_aln_alignments.sai output form bwa aln in order to generate SAM file from the alignments for single-end reads.

General BWA Samse Usage
bwa samse -f bwa_aln_alignments.sam index_prefix bwa_aln_alignments.sai input_reads.fasta

The command bwa sampe uses the bwa_aln_alignments.sai output form bwa aln in order to generate SAM file from the alignments for paired-end reads.

General BWA Sampe Usage
bwa samse -f bwa_aln_alignments.sam index_prefix bwa_aln_alignments_pair_1.sai bwa_aln_alignments_pair_2.sai input_reads_pair_1.fasta input_reads_pair_2.fasta

BWA Fastmap:

The command bwa fastmap identifies and outputs super-maximal exact matches (SMEMs).

General BWA Fastmap Usage
bwa fastmap index_prefix input_reads.fasta > bwa_fastmap.matches

BWA Pemerge:

The command bwa pemerge merges overlapping paired ends and can print either only the merged reads or the unmerged ones. An example of bwa pemerge of input_reads_pair_1.fastq and input_reads_pair_2.fastq with 8 CPUs and output file output_reads_merged.fastq that contains only the merged reads is shown below:

General BWA Pemerge Usage
bwa pemerge -m input_reads_pair_1.fastq input_reads_pair_2.fastq -t $SLURM_NTASKS_PER_NODE > output_reads_merged.fastq


BWA Fa2pac:

The command bwa fa2pac converts fasta to pac files.

General BWA Pac2pac Usage
bwa fa2pac input_reads.fasta pac_prefix


BWA Pac2bwt and BWA Pac2bwtgen:

The commands bwa pac2bwt and bwa pac2bwtgen convert pac to bwt files.

General BWA Pac2bwt Usage
bwa pac2bwt input_reads.pac output_reads.bwt
General BWA Pac2bwtgen Usage
bwa pac2bwtgen input_reads.pac output_reads.bwt


BWA Bwtupdate:

The command bwa bwtupdate updates bwt files to the new format.

General BWA Bwtupdate Usage
bwa bwtupdate input_reads.bwt

BWA Bwt2sa:

The command bwa bwt2sa generates sa files from bwt and Occ files.

General BWA Bwt2sa Usage
bwa bwt2sa input_reads.bwt


Useful Information

In order to test the scalability of BWA (bwa/0.7) on Crane, we used two paired-end input fastq files: large_1.fastq and large_2.fastq, and one single-end input fasta file, large.fasta. Some statistics about the input files and the time and memory resources required for bwa mem are shown on the table below:

 total # of sequencestotal size in MB# of used CPUsrunning time for 4 CPUsrequired memory for 4 CPUs# of used CPUsrunning time for 8 CPUsrequired memory for 8 CPUs# of used CPUsrunning time for 16 CPUsrequired memory for 16 CPUs
large_1.fastq10,174,7153,376 MB4~ 35 minutes~ 12 GB8~ 18.5 minutes~ 18 GB16~ 10 minutes~ 19 GB
large_2.fastq10,174,7153,376 MB
large.fasta592,593836 MB4~ 5.5 minutes~ 3 GB8~ 3 minutes~ 4 GB16~ 2 minutes~ 6.2 GB
  • No labels