A Fast Parallel Algorithm for Indexing Human Genome Sequences

Woong-Kee LOH  Kyoung-Soo HAN  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E97-D   No.5   pp.1345-1348
Publication Date: 2014/05/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E97.D.1345
Type of Manuscript: LETTER
Category: Data Engineering, Web Information Systems
Keyword: 
human genome sequences,  suffix tree,  parallel algorithm,  suffix array,  disk-based index,  

Full Text: PDF>>
Buy this Article




Summary: 
A suffix tree is widely adopted for indexing genome sequences. While supporting highly efficient search, the suffix tree has a few shortcomings such as very large size and very long construction time. In this paper, we propose a very fast parallel algorithm to construct a disk-based suffix tree for human genome sequences. Our algorithm constructs a suffix array for part of the suffixes in the human genome sequence and then converts it into a suffix tree very quickly. It outperformed the previous algorithms by Loh et al. and Barsky et al. by up to 2.09 and 3.04 times, respectively.