AUTHORS: Giovanni Puccetti Laura Righi Ilaria Sabbatini Andrea Esuli
WORK PACKAGE: WP7 REVER
URL: Automatic Extraction of Regesta for Medieval Latin Text Summarization
Keywords:
Abstract
The REVERINO project has developed a dataset of over 4,500 medieval Latin text/summary pairs, extracted from two 13th-century papal collections (MGH and Auvray) through an automated pipeline based on annotation, custom training, OCR, and post-processing. The dataset was used to evaluate the summarization capabilities of LLMs (GPT-4, LLaMA), revealing both the limitations and potential of AI for automated regesta generation. This work contributes to the development of tools for historical digitization and research in the field of Digital Humanities.