For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
A Fast Parallel Algorithm for Indexing Human Genome Sequences
Woong-Kee LOH Kyoung-Soo HAN
IEICE TRANSACTIONS on Information and Systems
Publication Date: 2014/05/01
Online ISSN: 1745-1361
Type of Manuscript: LETTER
Category: Data Engineering, Web Information Systems
human genome sequences, suffix tree, parallel algorithm, suffix array, disk-based index,
Full Text: PDF>>
A suffix tree is widely adopted for indexing genome sequences. While supporting highly efficient search, the suffix tree has a few shortcomings such as very large size and very long construction time. In this paper, we propose a very fast parallel algorithm to construct a disk-based suffix tree for human genome sequences. Our algorithm constructs a suffix array for part of the suffixes in the human genome sequence and then converts it into a suffix tree very quickly. It outperformed the previous algorithms by Loh et al. and Barsky et al. by up to 2.09 and 3.04 times, respectively.