Towards semi-supervised transcription of handwritten historical weather reports

Jan Richarz, Szilard Vajda and Gernot A. Fink
Proc. IAPR Int. Workshop on Document Analysis Systems, 2012.

Gold Coast, Queensland, Australia

BibTeX PDF

Abstract

This paper addresses the automatic transcription of handwritten documents with a regular tabular structure. A method for extracting delineated tables from images is proposed, using very little prior knowledge about the document layout. The detected structure serves as query for retrieving and fitting a template, which is then used to extract text fields. A semi-supervised learning approach is applied to the extracted fields, aiming at minimizing the human labeling effort for recognizer training. The effectiveness of the proposed approach is demonstrated experimentally on a set of historical weather reports. Compared to using all labels, competitive recognition performance is achieved by labeling only a small fraction of the data, keeping the human effort involved in the process very low.