Sanctuaria-Gaze: A Multimodal Egocentric Dataset for Human Attention Analysis in Religious Sites

AUTHORS: Giuseppe Cartella, Vittorio Cuculo, Marcella Cornia, Marco Papasidero, Federico Ruozzi, Rita Cucchiara 

WORK PACKAGE: WP8

URL: https://umanisticadigitale.unibo.it/article/view/22160

Keywords: Latin Bibles, Latin Patristics, Intertextuality, BERT-based Sentence Embeddings, IRCDL2025

Abstract

This study presents an interdisciplinary methodology for detecting intertextual references in Latin patristic literature through a novel combination of philological rigor and Natural Language Processing (NLP) techniques. Focusing on Augustine of Hippo’s De Genesi ad litteram and its relationship to Latin biblical texts (specifically Jerome’s Vulgate and pre-Vulgate versions), this research introduces a token-based classification system enriched with semantic annotations, supported by the INCEpTION platform. The classification system accounts for exact matches, lemmatized forms, synonyms, and structural parallels, capturing a wide spectrum of textual similarity. To enhance automatic retrieval of these intertextual links, we fine-tune BERT-based language models for Latin, incorporating contrastive learning and hard negative mining. Experimental results demonstrate that fine-tuned models significantly outperform baselines across varying levels of textual similarity. This work highlights the utility of computational models in bridging explicit citations and implicit allusions, offering a scalable approach for the study of biblical intertextuality in ancient texts.