A Fast and Accurate FPGA System for Short Read Mapping Based on Parallel Comparison on Hash Table

Yoko SOGABE  Tsutomu MARUYAMA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E100-D   No.5   pp.1016-1025
Publication Date: 2017/05/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2016EDP7262
Type of Manuscript: PAPER
Category: Computer System
Keyword: 
field programmable gate array,  genome sequence alignment,  accelerated implementation,  bioinformatics,  

Full Text: PDF(1.1MB)>>
Buy this Article




Summary: 
The purpose of DNA sequencing is to determine the order of nucleotides within a DNA molecule of target. The target DNA molecules are fragmented into short reads, which are short fixed-length subsequences composed of ‘A’, ‘C’, ‘G’ ‘T’, by next generation sequencing (NGS) machine. To reconstruct the target DNA from the short reads using a reference genome, which is a representative example of a species that was constructed in advance, it is necessary to determine their locations in the target DNA from where they have been extracted by aligning them onto the reference genome. This process is called short read mapping, and it is important to improve the performance of the short read mapping to realize fast DNA sequencing. We propose three types of FPGA acceleration methods based on hash table; (1) sorting and parallel comparison, (2) matching that allows one mutation to reduce the number of the candidates, (3) optimized hash function using variable masks. The first one reduces the number of accesses to off-chip memory to avoid the bottleneck by access latency. The second one enables to reduce the number of the candidates without degrading mapping sensitivity by allowing one mutation in the comparison. The last one reduces hash collisions using a table that was calculated from the reference genome in advance. We implemented the three methods on Xilinx Virtex-7 and evaluated them to show their effectiveness of them. In our experiments, our system achieves 20 fold of processing speed compared with BWA, which is one of the most popular mapping tools. Furthermore, we shows that the our system outperforms one of the fastest FPGA short read mapping systems.