In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format.Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package.
However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently.
The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals.
Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. from Illumina sequencing machines, and color space reads from AB SOLi D machines.
Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy.
These programs can be easily parallelized with multi-threading, but they usually require large memory to build an index for the human genome.
Availability: The Illumina/Solexa sequencing technology typically produces 50–200 million 32–100 bp reads on a single run of the machine.
Mapping this large volume of short reads to a genome as large as human poses a great challenge to the existing sequence alignment programs.
Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs.
A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual.
To meet the requirement of efficient and accurate short read mapping, many new alignment programs have been developed.
Some of these, such as Eland (Cox, 2007, unpublished material), RMAP (Smith , 2008), Seq Map (Jiang and Wong, 2008), Cloud Burst (Schatz, 2009) and SHRi MP ( work by hashing the read sequences and scan through the reference sequence.