Word Spotting in Historical Document Collections with Online-Handwritten Queries

Christian Wieprecht, Leonard Rothacker and Gernot A. Fink
Proc. IAPR Int. Workshop on Document Analysis Systems, 2016.

Santorini, Greece



Pen-based systems are becoming more and more important due to the growing availability of touch sensitive devices in various forms and sizes. Their interfaces offer the possibility to directly interact with a system by natural handwriting. In contrast to other input modalities it is not required to switch to special modes, like software-keyboards. In this paper we propose a new method for querying digital archives of historical documents. Word images are retrieved with respect to search terms that users write on a pen-based system by hand. The captured trajectory is used as a query which we call query-by-online-trajectory word spotting. By using attribute embeddings for both online-trajectory and visual features, word images are retrieved based on their distance to the query in a common subspace. The system is therefore robust, as no explicit transcription for queries or word images is required. We evaluate our approach for writer-dependent as well as writer-independent scenarios, where we present highly accurate retrieval results in the former and compelling retrieval results in the latter case. Our performance is very competitive in comparison to related methods from the literature.