Word Spotting in the Era of Deep Learning

Gernot A. Fink
Tutorial (invited) presented at École Pratique des Hautes Études, Paris, France, 2019.

BibTeX PDFSupplementary material

Abstract

Research in building automatic reading systems has made considerable progress since its first inception in the 1960's. Today, quite mature techniques are available for the automatic recognition of machine-printed text. However, the automatic reading of handwriting is a considerably more challenging task, especially when it comes to historical manuscripts. When current methods for handwriting recognition reach their limits, approaches for so-called word spotting come into play. These can be considered as specialized versions of image retrieval techniques. The most successful methods rely on machine learning methods in order to derive powerful query models for handwriting retrieval.

This tutorial will be organized in two parts: After an introduction to the problem of word spotting and a brief look at the methodological development in the field, the first part will cover classical approaches for learning word spotting models that build on Bag-of-Features (BoF) representations.

In the second part of the tutorial, advanced models for word spotting will be presented that apply techniques of deep learning and, currently, define the state-of-the-art in the field. Therefore, first foundations of neural networks in general and deep architectures in particular will be laid. Recently, the success of such deep networks largely became possible because solutions to the crucial problem of vanishing gradients were proposed. Combining the idea of string embeddings and the application of a unified framework that can be learned in an end-to-end fashion, unprecedented performance on a number of challenging word spotting tasks can be achieved, as has been demonstrated by the PHOCNet.