Minimum Covering Run Expression of Document Images Based on Matching of Bipartite Graph

Supoj CHINVEERAPHAN  Ken'ichi DOUNIWA  Makoto SATO  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E76-D   No.4   pp.462-469
Publication Date: 1993/04/25
Online ISSN: 
DOI: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Image Processing and Understanding)
Category: 
Keyword: 
Minimum Covering Run (MCR) expression,  run,  bipartite graph,  adjacency matrix,  minimum vertex-covering,  maximum matching,  Hopcroft-Karp's algorithm,  

Full Text: PDF(570.3KB)>>
Buy this Article




Summary: 
An efficient technique for expressing document image is required as part of a unified approach to document image processing. This paper presents a new method, Minimum Covering Run (MCR), for expressing binary images. The name being adapted from horizontal or vertical run representation. The proposed technique uses some horizontal and vertical runs together to represent binary images in which the total number of representative runs is minimized. Considering the characteristic of above run types precisely, it is shown that horizontal and vertical runs of any binary image could be thought of as partite sets of a bipartite graph. Consequently, the MCR expression that corresponds to the construction of one of the most interesting problems in graphs; i.e., maximum matching, is analogously found by using an algorithm which solves this problem in a corresponding graph. The most efficient algorithm takes at most O(n5/2) computations for solving the problem where n is the sum of cardinalities of both partite sets. However, some patterns in images like tables or line drowings, generally, have a large number of runs representing them which results in a long processing time. Therefore, we provide the Rectangular Segment Analysis (RSA) as a pre-processing to define runs representing such patterns beforehand. We also show that horizontal and vertical covering parts of the proposed expression are able to represent stroke components of characters in document images. As an implementation, an efficient algorithm including arrangement for run data structure of the MCR expression is presented. The experimental results show the possibility of stroke extraction of characters in document images. As an application, some patterns such as tables can be extracted from document images.