Fast String Searching in a Character Lattice

Shuji SENDA  Michihiko MINOH  Katsuo IKEDA  

IEICE TRANSACTIONS on Information and Systems   Vol.E77-D   No.7   pp.846-851
Publication Date: 1994/07/25
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Document Analysis and Recognition)
character recognition,  character lattice,  string searching algorithm,  

Full Text: PDF>>
Buy this Article

This paper presents an algorithm for string searching in a character lattice. A character lattice, which is obtained through a character recognition process, is a general and flexible data structure that represents many hypothesized strings in a document image. In this paper, the authors propose a simple and efficient algorithm; it consists of a single loop of some set-operations and scans the character lattice only once. The authors also describe two actual implementations of the algorithm; one uses Bit-Arrays and the other a Trie. Owing to its bir parallelism, the Bit-Array approach is able to search for a single pattern faster than the Trie approach, and is easily extended to complex matchings such as an approximate one. It is suited for document retrieval systems that need to search for a keyword as fast as possible. A hashed compact version of the character lattice is also useful to increase the speed of the search for a single pattern. In contrast, the Trie approach is able to search for a large number of patterns simultaneously, and is suited for document understanding systems that need to extract words from the character lattice. The experimental results have shown that both approaches achieve high performance.