Statistical Modeling of the Relation between Characters and Diacritics in Lampung Script

Akmal Junaidi, Rene Grzeszick, Sizilard Vadja and Gernot A. Fink
Proc. Int. Conf. on Document Analysis and Recognition, 2013.

Washington DC, USA

BibTeX PDF

Abstract

Lampung Script is a non-cursive script where a rich set of diacritics is used to modify the syllable denoted by a character symbol. Consequently, the analysis of the relation between characters and diacritic marks associated with them plays an important role in the recognition process. As diacritics can appear in three different relative positions with respect to a character (top, bottom, and right) associating them correctly with a character is a challenging problem. In this paper we propose a novel approach for modeling the relations between characters and diacritics in handwritten Lampung documents. First, a document is segmented into characters and diacritic marks. Then every character defines a normalized coordinate system into which nearby diacritics can be mapped. The relation between a diacritic mark and its associated character can then be described by a statistical model. In a writer independent experimental evaluation we investigate models with different degrees of specialization with respect to their capability of predicting the correct character-todiacritic associations. We achieve significant error rate reductions with respect to a naive association model using a nearest-neighbor criterion.