Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

Takuya TAKAGI  Shunsuke INENAGA  Kunihiko SADAKANE  Hiroki ARIMURA  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E100-A   No.9   pp.1785-1793
Publication Date: 2017/09/01
Online ISSN: 1745-1337
Type of Manuscript: Special Section PAPER (Special Section on Discrete Mathematics and Its Applications)
online text indexing,  packed string matching,  sparse suffix trees,  dynamic predecessor dictionaries,  

Full Text: PDF(1.2MB)
>>Buy this Article

We present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in nlog σ+O(klog n) bits of space and supports fast pattern matching queries and updates, where σ is the alphabet size. Assume that α=logσn letters are packed in a single machine word on the standard word RAM model, and let f(k,n) denote the query and update times of the dynamic predecessor/successor data structure of our choice which stores k integers from universe [1,n] in O(klog n) bits of space. Then, given a string of length m, our packed c-tries support pattern matching queries and insert/delete operations in $O( rac{m}{alpha} f(k,n))$ worst-case time and in $O( rac{m}{alpha} + f(k,n))$ expected time. Our experiments show that our packed c-tries are faster than the standard compact tries (a.k.a. Patricia trees) on real data sets. We also discuss applications of our packed c-tries.