Recognition of Degraded Machine-Printed Characters Using a Complementary Similarity Measure and Error-Correction Learning

Minako SAWAKI  Norihiro HAGITA  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E79-D   No.5   pp.491-497
Publication Date: 1996/05/25
Online ISSN: 
DOI: 
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Issue on Character Recognition and Document Understanding)
Category: Classification Methods
Keyword: 
character recognition,  error-correction learning,  similarity measure,  noise model,  fontstyle recognition,  

Full Text: PDF>>
Buy this Article




Summary: 
Most conventional methods used in character recognition extract geometrical features, such as stroke direction and connectivity, and compare them with reference patterns in a stored dictionary. Unfortunately, geometrical features are easily degraded by blurs and stains, and by the graphical designs such as used in Japanese newspaper headlines. This noise must be removed before recognition commences, but no preprocessing method is perfectly accurate. This paper proposes a method for recognizing degraded characters as well as characters printed on graphical designs. This method extracts features from binary images, and a new similarity measure, the complementary similarity measure, is used as a discriminant function; it compares the similarity and dissimilarity of binary patterns with reference dictionary patterns. Experiments are conducted using the standard character database ETL-2, which consists of machine-printed Kanji, Hiragana, Katakana, alphanumeric, and special characters. The results show that our method is much more robust against noise than the conventional geometrical-feature method. It also achieves high recognition rates of over 97% for characters with textured foregrounds, over 99% for characters with textured backgrounds, over 98% for outline fonts and over 99% for reverse contrast characters. The experiments for recognizing both the fontstyles and character category show that it also achieves high recognition rates against noise.