Named Entity Linking on Handwritten Document Images

Oliver Tueselmann and Gernot A. Fink
Proc. Int. Workshop on Document Analysis Systems, pages 199-213, 2022.

La Rochelle, France

BibTeX PDF

Abstract

Named Entity Linking (NEL) is an information extraction task that semantically enriches documents by recognizing mentions of entities in a text and matching them against an entry in a Knowledge Base (KB). This semantic information is fundamentally important for realizing a semantic search. Furthermore, it serves as a feature for subsequent tasks (i.e. Question Answering) as well as for improving the user experience. Current NEL approaches and datasets from the Document Image Analysis community are mainly focusing on machine-printed documents and do not consider handwriting. This is mainly due to the lack of annotated NEL handwriting datasets. To fill this gap, we manually annotated the well known IAM and George Washington datasets with NEL labels and created a synthetic handwritten version of the AIDA-CoNLL dataset. Furthermore, we present an evaluation protocol as well as a baseline approach.