Segmentation-free Query-by-String Word Spotting with Bag-of-Features HMMs

Leonard Rothacker and Gernot A. Fink
Proc. Int. Conf. on Document Analysis and Recognition, 2015.

Nancy, France

BibTeX PDF

Abstract

Word spotting allows to explore document images without requiring a full transcription. In the query-by-string scenario considered in this paper, it is possible to search arbitrary keywords while only limited prior information about the documents is required. We learn context-dependent character models from a training set that is small with respect to the number of models. This is possible due to the use of Bag-of-Features HMMs that are especially suited for estimating robust models from limited training material. In contrast to most query-by-string methods we consider a fully segmentation-free decoding framework that does not require any pre-segmentation on word or line level. Experiments on the well-known George Washington benchmark demonstrate the high accuracy of our method.