A Probabilistic Retrieval Model for Word Spotting Based on Direct Attribute Prediction

Eugen Rusakov, Leonard Rothacker, Hynuho Mo and Gernot A. Fink
Proc. Int. Conf. on Frontiers in Handwriting Recognition, 2018, Winner of the IAPR Best Student Paper Award.

Niagara Falls, USA

BibTeX PDF

Abstract

In recent years CNNs took over in various fields of computer vision. Adapted to document image analysis, they achieved state-of-the-art performance in word spotting by pre- dicting word string embeddings. One prominent embedding splits a given string in temporal pyramidal regions of character occurrences, namely the Pyramidal Histogram of Characters (PHOC). This string embedding can be interpreted as a binary attribute representation. In this work we present a new approach for ranking retrieval lists originally proposed for zero-shot learning where attribute representations play an important role. Instead of a distance-based matching of the predicted string embedding, we compute the posterior probability of the attribute representation given a word image which is equivalent to the posterior of the query if uniform priors are used. We can show that this probabilistic ranking improves word spotting performance, especially in the query-by-string scenario.