Annotating handwritten characters with minimal human involvement in a semi-supervised learning strategy

Jan Richarz, Szilard Vajda and Gernot A. Fink
Proc. Int. Conf. on Frontiers in Handwriting Recognition, 2012.

Bari, Italy

BibTeX PDF

Abstract

One obstacle in the automatic analysis of handwritten documents is the huge amount of labeled data typically needed for classifier training. This is especially true when the document scans are of bad quality and different writers and writing styles have to be covered. Consequently, the considerable human effort required in the process currently prohibits the automatic transcription of large document collections. In this paper, two semi-supervised multiview learning approaches are presented, reducing the manual burden by robustly deriving a large number of labels from relatively few manual annotations. The first is based on cluster-level annotation followed by a majority decision, whereas the second casts the labeling process as a retrieval task and derives labels by voting among ranked lists. Both methods are thoroughly evaluated in a handwritten character recognition scenario using realistic document data.