Statistical Language Models for On-Line Handwriting Recognition

Freddy PERRAUD  Christian VIARD-GAUDIN  Emmanuel MORIN  Pierre-Michel LALLICAN  

IEICE TRANSACTIONS on Information and Systems   Vol.E88-D   No.8   pp.1807-1814
Publication Date: 2005/08/01
Online ISSN: 
DOI: 10.1093/ietisy/e88-d.8.1807
Print ISSN: 0916-8532
Type of Manuscript: Special Section PAPER (Special Section on Document Image Understanding and Digital Documents)
Category: On-line Word Recognition
handwriting recognition,  language modeling,  n-gram,  n-class,  perplexity,  

Full Text: PDF(635.9KB)>>
Buy this Article

This paper incorporates statistical language models into an on-line handwriting recognition system for devices with limited memory and computational resources. The objective is to minimize the error recognition rate by taking into account the sentence context to disambiguate poorly written texts. Probabilistic word n-grams have been first investigated, then to fight the curse of dimensionality problem induced by such an approach and to decrease significantly the size of the language model an extension to class-based n-grams has been achieved. In the latter case, the classes result either from a syntactic criterion or a contextual criteria. Finally, a composite model is proposed; it combines both previous kinds of classes and exhibits superior performances compared with the word n-grams model. We report on many experiments involving different European languages (English, French, and Italian), they are related either to language model evaluation based on the classical perplexity measurement on test text corpora but also on the evolution of the word error rate on test handwritten databases. These experiments show that the proposed approach significantly improves on state-of-the-art n-gram models, and that its integration into an on-line handwriting recognition system demonstrates a substantial performance improvement.