Department of Computer Science LS XII - Pattern Recognition Group

Erschlie{\ss}ung von Massenakten mit KI? -- Oder sind gro{\ss}e multi-modale Sprachmodelle die zuk{\"u}nftigen Werkzeuge der Tiefenerschlie{\ss}ung? (Deep Indexing of Bulk Files with AI? -- Or are Large Multi-Modal Language-Models the Future Tools of Deep Indexing?)

Astrid K{\"u}ntzel, Gernot A. Fink, Oliver T{\"u}selmann and Fabian Wolf
Workshop (invited) at the Interner Archivtag 2024, Gelsenkirchen, 2024.

BibTeX

Abstract

In this workshop, we will explore the question of whether the potential of multimodal large language models (MLLM) - i.e. the siblings of ChatGPT & Co. extended to process text and image data - predestine them as future tools for the deep indexing of archival records.

We will begin with a brief introduction to the essential characteristics of purely text-based large language models (LLMs) in general and the MLLMs considered here in particular. We will discuss the potential of MLLM technology interactively with the workshop participants on the basis of document examples from public sources and ChatGPT. We will then demonstrate the performance of current multi-modal open source language models in exemplary indexing tasks on a range of different archival materials. Good results can only be expected here if the appearance of the processed documents is close to the "experience" of the models, i.e. comparable document types are available on the Internet. We then show what significant improvements in performance can be achieved when the models are adapted to the specific indexing task by means of fine-tuning. There will also be an opportunity to discuss relevant application scenarios. Finally, we will address the question of the considerable resource requirements of current MLLMs and the technical prerequisites for deploying such models.