Fabian Wolf, Kai Brandenbusch and Gernot A. Fink
Proc. Int. Conf. on Frontiers in Handwriting Recognition, pages 61-66, 2020.
Dortmund, Germany (virtual)
Annotation-free word spotting aims at retrieving relevant word images from a document collection without the need of a manually labeled training dataset. As annotated data is usually scarce in the application scenarios of a word spotting system, transfer learning and annotation-free methods became increasingly popular. One possibility to alleviate the annotation problem is to train on synthetically generated word images. Therefore, a common approach is to render word images from electronic fonts and to vary the synthesis parameters randomly. In this work, we show that an annotation-free word spotting method benefits from an adapted synthesis procedure. We investigate the influence of the choice of the underlying vocabulary and the combination of synthesis and data augmentation. Furthermore, we present a method to adapt the style of the synthesized word images to the target dataset. We evaluate the proposed changes to the synthesis procedure on three benchmark datasets and improve performances considerably.