Table-Form Structure Analysis Based on Box-Driven Reasoning

Osamu HORI  David S. DOERMANN  

IEICE TRANSACTIONS on Information and Systems   Vol.E79-D   No.5   pp.542-547
Publication Date: 1996/05/25
Online ISSN: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Character Recognition and Document Understanding)
Category: Document Recognition and Analysis
document understanding,  table-form,  form,  structure analysis,  

Full Text: PDF(514KB)>>
Buy this Article

Table-form document structure analysis is an important problem in the document processing domain. This paper presents a new method called Box-Driven Reasoning (BDR) to robustly analyze the structure of table-form documents that include touching characters and broken lines. Real documents are copied repeatedly and overlaid with printed data, resulting in characters that touch cells and lines that are broken. Most previous methods employ a line-oriented approach, but touching characters and broken lines make the procedure fail at an early stage. BDR deals with regions directly in contrast with other previous methods and a reduced resolution image is introduced to supplement information deteriorated by noise. Experimental tests show that BDR reliably recognizes cells and strings in document images with touching characters and broken lines.