Fabian Wolf and Gernot A. Fink
Proc. Int. Conf. on Frontiers in Handwriting Recognition, pages 300-315, 2022.
Hyderabad, India
Handwritten Text Recognition (HTR) relies on deep learning to achieve high performances. Its success is substantially driven by large annotated training datasets resulting in powerful recognition models. Performances suffer considerably when applied to document collections with a distinctive style that is not well represented by training data. Applying a recognition model to a new data collection poses a tremendous annotation effort, which is often out of scope, for example considering historic collections. To overcome this limitation, we propose a training scheme that combines multiple data sources. Synthetically generated samples are used to train an initial model. Self-training offers the possibility to exploit unlabeled samples. We further investigate the question of how a small number of manually annotated samples can be integrated to achieve maximal performance with limited annotation effort. Therefore, we add labeled samples at different stages of self-training and propose two criteria, namely confidence and diversity, for the selection of samples to annotate. In our experiments, we show that the proposed training scheme is able to considerably close the gap to fully-supervised training on the designated training set with less than ten percent of the labeling demand.