Deep Learning for Word Spotting: Foundations and Current Developments

Gernot A. Fink
Tutorial (invited) presented at the 5th TC10/TC11 Summer School on Document Analysis, Fribourg, Switzerland, 2023.

BibTeX PDF

Abstract

Machine Learning has made remarkable performance of reading systems possible. Nevertheless, especially with historic documents, there are situations when the complete transcription of a document in question is no longer feasible. This is when approaches for so-called word spotting come into play that today mostly use deep neural networks to build powerful models for document retrieval.

This lecture will first give a brief introduction to the problem of word spotting and the methodological developments in the field. Afterwards, the foundations of current word spotting technology, namely deep neural networks, will be introduced. Then it will be shown how unprecedented retrieval performance can be achieved by adapting convolutional neural networks for the problem of word spotting. Here especially the architecture of the so-called PHOCNet will be covered that has pioneered deep neural models in the field.

However, retrieval performance usually comes at the price that huge amounts of annotated data are required to train high-quality models. This is a major hindrance when thinking about the practical application of word spotting. Therefore, methods will be presented that allow to reduce the required amount of annotated training data significantly by making use of synthetic data, transfer learning and self-training. Another challenge in document retrieval arises from the fact that word spotting normally consideres similarity in a purely syntactic fashion. Therefore, methods will be presented that allow to incorporate semantic similarity of search terms in word spotting models.