Exploring the uBIQUity of Biblical Texts: Tradition and Innovation in the Ancient and Digital Worlds

AUTHORS: Anna Mambelli, Marcello Costa

WORK PACKAGE: WP8

URL: https://iris.unimore.it/handle/11380/1389232?mode=full

Keywords: Biblical Studies; Computer Science; uBIQUity; ITSERR; Bible; Quran; Ancient Christian and Islamic Exegetical Works; Digital Humanities; Intertextuality; Ancient Christian Literature; Vector-semantic Embedding Models; Variant Readings; Node-based User Interface

Abstract

The uBIQUity project operates at the intersection of Biblical studies and computer science to investigate the presence and reception of sacred texts (the Bible[s] and the Qurʾān) in ancient Christian and Islamic exegetical works. Reversing the usual perspective, the project shows how the philological rigor of Biblical studies offers a distinctive contribution to the field of digital humanities. The main goal is to develop an innovative semantic search engine capable of identifying intertextual references with a high degree of accuracy. Methodologically, uBIQUity combines token-based and language-aware techniques with state-of-the-art vector-semantic embedding models. A core innovation is the rigorous classification system designed to quantify and qualify textual similarity, moving beyond the dichotomy between “quotation” and “allusion”. This system incorporates textual granularity by meticulously analyzing variant readings found in the critical apparatus of the modern editions of ancient biblical versions. The project employs user experience design principles to develop a flexible, node-based user interface (*node-based UI*) that aligns with the non-linear cognitive styles of humanistic scholars. uBIQUity is an example of transdisciplinary research that generates scientific innovation rooted in tradition.




Sanctuaria-Gaze: A Multimodal Egocentric Dataset for Human Attention Analysis in Religious Sites

AUTHORS: Giuseppe Cartella, Vittorio Cuculo, Marcella Cornia, Marco Papasidero, Federico Ruozzi, Rita Cucchiara 

WORK PACKAGE: WP8

URL: https://umanisticadigitale.unibo.it/article/view/22160

Keywords: Latin Bibles, Latin Patristics, Intertextuality, BERT-based Sentence Embeddings, IRCDL2025

Abstract

This study presents an interdisciplinary methodology for detecting intertextual references in Latin patristic literature through a novel combination of philological rigor and Natural Language Processing (NLP) techniques. Focusing on Augustine of Hippo’s De Genesi ad litteram and its relationship to Latin biblical texts (specifically Jerome’s Vulgate and pre-Vulgate versions), this research introduces a token-based classification system enriched with semantic annotations, supported by the INCEpTION platform. The classification system accounts for exact matches, lemmatized forms, synonyms, and structural parallels, capturing a wide spectrum of textual similarity. To enhance automatic retrieval of these intertextual links, we fine-tune BERT-based language models for Latin, incorporating contrastive learning and hard negative mining. Experimental results demonstrate that fine-tuned models significantly outperform baselines across varying levels of textual similarity. This work highlights the utility of computational models in bridging explicit citations and implicit allusions, offering a scalable approach for the study of biblical intertextuality in ancient texts.




The Biblical Heritage in Ancient Latin Christian Literature: Advancing Intertextual Mapping Through Sentence Embeddings

AUTHORS: Anna Mambelli, Laura Bigoni, Davide Dainese, Fabio Tutrone, Davide CaffagniFederico CocchiMarco ZanellaMarcella Cornia, Rita Cucchiara 

WORK PACKAGE: WP8

URL: https://umanisticadigitale.unibo.it/article/view/22160

Keywords: Latin Bibles, Latin Patristics, Intertextuality, BERT-based Sentence Embeddings, IRCDL2025

Abstract

This study presents an interdisciplinary methodology for detecting intertextual references in Latin patristic literature through a novel combination of philological rigor and Natural Language Processing (NLP) techniques. Focusing on Augustine of Hippo’s De Genesi ad litteram and its relationship to Latin biblical texts (specifically Jerome’s Vulgate and pre-Vulgate versions), this research introduces a token-based classification system enriched with semantic annotations, supported by the INCEpTION platform. The classification system accounts for exact matches, lemmatized forms, synonyms, and structural parallels, capturing a wide spectrum of textual similarity. To enhance automatic retrieval of these intertextual links, we fine-tune BERT-based language models for Latin, incorporating contrastive learning and hard negative mining. Experimental results demonstrate that fine-tuned models significantly outperform baselines across varying levels of textual similarity. This work highlights the utility of computational models in bridging explicit citations and implicit allusions, offering a scalable approach for the study of biblical intertextuality in ancient texts.




Non-Linear Thinking Processes in Digital Humanities

AUTHORS: Marcello Costa

WORK PACKAGE: WP8

URL: https://iris.unipa.it/handle/10447/686443?mode=full

Keywords: Data Visualization, Digital Humanities, Interaction Design, Node-Based UI, Artificial Intelligence

Abstract

This paper investigates the benefits of node-based user interfaces in Digital Humanities (DH). It highlights the importance of a qualitative versus a quantitative approach for text comparisons generated by Artificial Intelligence (AI). Since thinking processes in text analysis are lost in generative outputs within current AI text-based User Interfaces (UI), keeping track of them allows for an interpretive approach. Within the fields of User Experience (UX) and Data Visualization Design we are developing uBIQUity, a platform that identifies intertextual references in Christian and Islamic sacred texts. The platform uses a node-based UI to manage, display data, and compare references. It emphasizes collaborative knowledge-building, and supports different cognitive learning styles. By combining philology, computer science, computational linguistics, and design, it overcomes the limitations of tools based solely on quantitative analysis, offering flexible interactions with digitized texts.




Ubiquity Platform. Data Visualization for Digital Humanities

AUTHORS: Marcello Costa

WORK PACKAGE: WP8

URL: https://iris.unipa.it/handle/10447/686443?mode=full

Keywords: LLM, Data Visualization, UX/UI, Digital Humanities

Abstract

The purpose of this paper is to demonstrate how node-based UI can overcome the limitations of AI text-based UI in terms of user experience and cognitive overload. It focuses on the development of Ubiquity, a digital platform that can enhance automatic text comparison in the field of religious studies. The platform’s objective is to produce new interpretations of the Scriptures and reconstruct the collective memory of religious communities over time. The study highlights the transdisciplinary team’s pivotal role in balancing the analytical approach of computer science with the interpretive approach of humanities. Ubiquity will assist researchers in formulating new research questions and keeping track of their reasoning. Instead of substituting humans, Ubiquity will use AI as a powerful aide.




Generating Synthetic Data with LargeLanguage Models for Low-Resource Sentence Retrieval

AUTHORS: Davide CaffagniFederico CocchiAnna MambelliFabio TutroneMarco ZanellaMarcella Cornia,
Rita Cucchiara 

WORK PACKAGE: WP8

URL: https://link.springer.com/chapter/10.1007/978-3-032-05409-8_4

Keywords: etrieval language-model synthetic-data

Abstract

Sentence similarity search is a fundamental task in information retrieval, enabling applications such as search engines, question answering, and textual analysis. However, retrieval systems often struggle when training data are scarce, as is the case for low-resource languages or specialized domains such as ancient texts. To address this challenge, we propose a novel paradigm for domain-specific sentence similarity search, where the embedding space is shaped by a combination of limited real data and a large amount of synthetic data generated by Large Language Models (LLMs). Specifically, we employ LLMs to generate domain-specific sentence pairs and fine-tune a sentence embedding model, effectively distilling knowledge from the LLM to the retrieval model. We validate our method through a case study on biblical intertextuality in Latin, demonstrating that synthetic data augmentation significantly improves retrieval effectiveness in a domain with scarce annotated resources. More broadly, our approach offers a scalable and adaptable framework for enhancing retrieval in domain-specific contexts. Source code and trained models are available at https://github.com/aimagelab/biblical-retrieval-synthesis.




Transductive model selection under prior probability shift

AUTHORS: Lorenzo Volpi, Alejandro Moreo, and Fabrizio Sebastiani

WORK PACKAGE: WP8

URL: https://arxiv.org/abs/2512.04759

Keywords: Model selection, Hyperparameter optimisation, Classifier accuracy prediction, Dataset shift, Prior probability shift, Transductive learning

Abstract

Transductive learning is a supervised machine learning task in which, unlike in traditional inductive learning,
the unlabelled data that require labelling are a finite set and are available at training time. Similarly to inductive
learning contexts, transductive learning contexts may be affected by dataset shift, i.e., may be such that the
assumption according to which the training data and the unlabelled data are independently and identically
distributed (IID), does not hold. We here propose a method, tailored to transductive classification contexts,
for performing model selection (i.e., hyperparameter optimisation) when the data exhibit prior probability
shift, an important type of dataset shift typical of anti-causal learning problems. In our proposed method
the hyperparameters can be optimised directly on the unlabelled data to which the trained classifier must be
applied; this is unlike traditional model selection methods, that are based on performing cross-validation on the
labelled training data. By tailoring model selection to the actual test distribution, our approach contributes to
the trustworthiness of AI systems, as it enables more reliable and robust classifier deployment under changed
conditions. We provide experimental results that show the benefits brought about by our method.