Grouping Historical Postcards Using Query-by-Example Word Spotting

Gernot A. Fink, Leonard Rothacker and Rene Grzeszick
Proc. Int. Conf. on Frontiers in Handwriting Recognition, 2014.

Crete, Greece

BibTeX PDF

Abstract

Handwritten historical documents pose extremely challenging problems for automatic analysis. This is due to the high variability observed in handwritten script, the use of writing styles and script types unknown today, the frequently lacking orthographic standardization, and the degradation of the respective documents. Therefore, it is currently out of question to develop general purpose handwriting recognition systems for historical document collections. It is, however, possible to search relatively homogeneous document collections using word spotting techniques. In this paper we consider the analysis of a challenging collection of postcards from the period of World War I delivered by the German military postal service. More specifically, we consider the automatic grouping of mail pieces by spotting potentially identical addressees. As the annotation of such documents is extremely challenging even for trained experts, a manually developed ground truth annotation will, in general, not be available. Furthermore, a reliable segmentation on word level will hardly be possible. With our segmentation-free query-by-example word spotting method we investigate modifications addressing the better generalization to a multi-writer scenario and its application to degraded documents. Promising results could be achieved in this highly challenging scenario.