For Full-Text PDF, please login, if you are a member of IEICE,|
or go to Pay Per View on menu list, if you are a nonmember of IEICE.
Fast String Searching in a Character Lattice
Shuji SENDA Michihiko MINOH Katsuo IKEDA
IEICE TRANSACTIONS on Information and Systems
Publication Date: 1994/07/25
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Document Analysis and Recognition)
character recognition, character lattice, string searching algorithm,
Full Text: PDF>>
This paper presents an algorithm for string searching in a character lattice. A character lattice, which is obtained through a character recognition process, is a general and flexible data structure that represents many hypothesized strings in a document image. In this paper, the authors propose a simple and efficient algorithm; it consists of a single loop of some set-operations and scans the character lattice only once. The authors also describe two actual implementations of the algorithm; one uses Bit-Arrays and the other a Trie. Owing to its bir parallelism, the Bit-Array approach is able to search for a single pattern faster than the Trie approach, and is easily extended to complex matchings such as an approximate one. It is suited for document retrieval systems that need to search for a keyword as fast as possible. A hashed compact version of the character lattice is also useful to increase the speed of the search for a single pattern. In contrast, the Trie approach is able to search for a large number of patterns simultaneously, and is suited for document understanding systems that need to extract words from the character lattice. The experimental results have shown that both approaches achieve high performance.