Language models for extracting and mapping the Symbolum in Arabic corpora

WORK PACKAGE: WP4 DamSym

URL: https://secure.onlinecongress.it

Keywords: William Hodge Mill; Hindu–Christian Relations; Sanskrit Studies; Colonial Studies; History of Religions

Abstract
The translations of the Nicene-Constantinopolitan Creed into Arabic are preserved in manuscripts spread throughout the world and they are almost entirely unpublished. Modern tools for semantic search in large scale corpora allows the retrieval and discovery of these texts in documents that are too large to be manually explored. Since there are currently no online corpora of digitized texts belonging to the Arab-Christian tradition, we choose Kitab (https://kitab-project.org/) as the reference dataset, the Arabic subcorpus of the Open Islamicate Texts Initiative that contains more than 10,200 text files plus their corresponding metadata. To access and explore it proficiently, we developed a search engine that performs semantic similarity searches across the corpus, looking for the passages according to the semantic representation computed by a language model. This tool allows for a fast search through the entire OpenITI corpus, of text passages that are semantically similar to the Symbolum and allows domain experts to extract relevant references and map them to the document where they belong.

Il reverendo William Hodge Mill: l’evangelizzazione al servizio del colonialismo britannico attraverso traduzioni e componimenti poetici in sanscrito

AUTHORS: Igor Spanò

WORK PACKAGE: WP4 DamSym

URL: https://iris.unipa.it/handle/10447/699129

Keywords: William Hodge Mill; Hindu–Christian Relations; Sanskrit Studies; Colonial Studies; History of Religions

Abstract
This article examines the figure of the Reverend William Hodge Mill (1792–1853) within the context of British colonialism in India, highlighting the role of Anglican evangelization as an instrument of cultural and political domination. Through an analysis of Anglican institutions in Calcutta—particularly Bishop’s College—and the missionary strategies adopted following the Charter Act of 1813, the study demonstrates how Orientalist knowledge of Sanskrit was mobilized to facilitate Christian penetration among Hindu elites. Central to the discussion is an examination of Mill’s Sanskrit translations of Christian texts and his poetic compositions, culminating in the Śrīkhr̥ṣṭasaṅgītā, a Sanskrit verse “Christiad” that reworks Indian epic models to present the life of Jesus. These works are situated within the broader phenomenon of the so-called Indian Christiad, a body of Christian literature in Sanskrit aimed at rendering Christianity “accommodable” within the conceptual framework of dharma. While displaying considerable philological and poetic sophistication, this cultural project was deeply marked by an ethnocentric ideology, in which processes of Christianization and colonial control mutually reinforced one another.

Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach

AUTHORS: Usman Nawaz, Liliana Lo Presti, Marianna Napolitano, Marco La Cascia

WORK PACKAGE: WP 4 – DamySim

URL https://link.springer.com/chapter/10.1007/978-3-031-70442-0_25

Keywords: Old Church Slavonic Lemmatization Ancient Language Natural Language Processing

Abstract
Old Church Slavonic (OCS) is an ancient language, and it has unique challenges and hurdles in natural language processing. Currently, there is a lack of Python libraries devised for the analysis of OCS texts. This research is not just filling the crucial gap in the computational treatment of OCS language but also producing valuable resources for scholars in historical linguistics, cultural studies, and humanities for the development of further research in the field of ancient language processing. The main contribution of this research work is the development of an algorithm for the lemmatization of OCS texts based on a learned dictionary. The approach can deal with ancient languages without the need for prior linguistic knowledge. Preparing a dataset of more than 330K words of OCS and their corresponding lemmas, this approach integrates the algorithm and dictionary efficiently to achieve accurate lemmatization on test data.

GNORM: Challenges and Potential of a 3D Visualisation of the Babylonian Talmud

AUTHORS: Andrea Ravasco, Arianna Maria Pavone

WORK PACKAGE: WP 3

URL: https://iris.unipa.it/handle/10447/695204

Keywords: Babylonian Talmud, Artificial Intelligence, Algorithm, Digital Humanities

Abstract
This contribution details the methodology and technological framework employed in the GNORM tool within the ITSERR project. It deals with the Babylonian Talmud: its structure, language, and sources, providing context for the tagging process.

GNORM: Challenges and Potential of a 3D Visualisation of the Corpus Iuris Canonici

AUTHORS: Vincenzo Roberto Imperia; Arianna Maria Pavone

WORK PACKAGE: WP 3

URL: https://arxiv.org/abs/2410.23409

Keywords: Corpus Iuris Canonici; Canon Law; Glosses; Digital Humanities; Intertextuality Recognition; Text Processing Algorithms

Abstract
The Corpus Iuris Canonici is one of the most influential legal compilations in Western history, yet its stratified nature and extensive system of glosses pose significant challenges for modern scholars. Traditional approaches to its study have relied on printed editions and static digital reproductions, which limit interactivity and analytical capabilities. This paper presents GNORM, a novel web-based tool designed to enhance the exploration of the Corpus Iuris Canonici through 3D visualization and advanced text-processing algorithms. Unlike conventional database-driven solutions, GNORM employs a structured text-based data architecture, preserving the integrity of historical sources while enabling dynamic navigation of legal texts and glosses. By integrating automated recognition of intertextual references, full-text search functionalities, and thematic filtering based on typological, chronological, and geographical criteria, GNORM is an innovative digital tool in this field of research. This paper details the theoretical and technological foundations of GNORM, demonstrating its potential to transform the digital study of canon law. The results highlight the advantages of interactive legal visualization and hypertextual navigation, offering a scalable and adaptable framework for future research. By bridging traditional legal scholarship with computational methodologies, GNORM represents a significant advancement in digital legal history, providing scholars with an unprecedented means of engaging with one of the most complex and historically significant legal texts of the Western tradition.

TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes

AUTHORS: Alessandro D’Amelio, Giuseppe Cartella, Vittorio Cuculo, Vittorio Cuculo, Marcella Cornia, Rita Cucchiara, Giuseppe Boccignone

WORK PACKAGE: WP 6 – YASMINE

URL: https://arxiv.org/abs/2410.23409

Keywords:

Abstract
Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the deserved amount of time given current processing demands, before shifting to the next one. As such, gaze deployment crucially is a temporal process. Existing computational models have made significant strides in predicting spatial aspects of observer’s visual scanpaths (where to look), while often putting on the background the temporal facet of attention dynamics (when). In this paper we present TPP-Gaze, a novel and principled approach to model scanpath dynamics based on Neural Temporal Point Process (TPP), that jointly learns the temporal dynamics of fixations position and duration, integrating deep learning methodologies with point process theory. We conduct extensive experiments across five publicly available datasets. Our results show the overall superior performance of the proposed model compared to state-of-the-art approaches. Source code and trained models are publicly available at: this https URL.

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

AUTHORS: Giuseppe Cartella, Vittorio Cuculo, Alessandro D’Amelio, Marcella Cornia, Giuseppe Boccignone, Rita Cucchiara

WORK PACKAGE: WP 6 – YASMINE

URL https://ieeexplore.ieee.org/document/10465604

Keywords:

Abstract
Predicting human gaze scanpaths is crucial for understanding visual attention, with applications in human-computer interaction, autonomous systems, and cognitive robotics. While deep learning models have advanced scanpath prediction, most existing approaches generate averaged behaviors, failing to capture the variability of human visual exploration. In this work, we present ScanDiff, a novel architecture that combines diffusion models with Vision Transformers to generate diverse and realistic scanpaths. Our method explicitly models scanpath variability by leveraging the stochastic nature of diffusion models, producing a wide range of plausible gaze trajectories. Additionally, we introduce textual conditioning to enable task-driven scanpath generation, allowing the model to adapt to different visual search objectives. Experiments on benchmark datasets show that ScanDiff surpasses state-of-the-art methods in both free-viewing and task-driven scenarios, producing more diverse and accurate scanpaths. These results highlight its ability to better capture the complexity of human visual behavior, pushing forward gaze prediction research. Source code and models are publicly available at this https URL.

Linked Geospatial Data for Rural Territorial Sustainability: A Knowledge Graph of European Mountain Value Chains

AUTHORS: Nicolò Pratelli; Emanuele Lenzi; Valentina Bartalesi

Work Package : WP4

URL: https://ieeexplore.ieee.org/document/11279601

Keywords: Triples (Data structure), Knowledge based systems, Urban areas, Europe, Knowledge graphs, Geospatial analysis, Stakeholders, Sustainable development, Socioeconomics, Resilience

Abstract
As global urbanisation trends continue, with projections indicating that 70% of the world population will reside in cities by 2050, there is growing concern over the sustainability of rural and mountain regions. These areas face increasing depopulation, threatening their socio-economic value chains (VCs). Addressing these challenges requires access to reliable and interoperable geospatial data. Knowing the exact location of mountain VCs can provide important insights to support territorial resilience. This paper investigates how Knowledge Representation and Semantic Web technologies can enhance the analysis of geographic data to support sustainable development in rural territories. As a case study, we use data from the H2020 MOVING project, encompassing 454 VCs across 16 European mountain regions. Our findings show that semantic technologies offer a valuable framework for integrating heterogeneous datasets, thereby improving decision-making and fostering resilience in rural areas.

ReCoptic: Computer Vision for the Reconstruction of Dismembered Coptic Codices

AUTHORS: Lorenzo Bianchi; Fabrizio Falchi; Alejandro Moreo; Fabrizio Sebastiani; Costanza Bianchi

Work Package : WP4

URL: https://iris.unipa.it/handle/10447/696346?mode=simple

Keywords: Hands, Computer vision, Codes, Accuracy, Planets, Posterior probability, Probabilistic logic,

History, Image reconstruction, Image classification

Abstract
In the course of history, many ancient codices (i.e., bound volumes of manuscripts) written in the Coptic language have been dismembered, often at the hand of sellers of antiques, into individual sheets, who have ended up scattered across the planet. Reconstructing these codices in their original form would be extremely important for a better understanding of the culture of Coptic-speaking communities, and is a long-standing goal of paleographers and egyptologists alike. In this paper we present ReCoptic, a probabilistic, “contrastive” image classification system based on computer vision techniques, whose goal is to aid scholars in reconstructing dismembered ancient Coptic codices. Given a collection of scans of individual pages of ancient Coptic manuscripts, the system evaluates, for each pair of such scans, the (“posterior”) probability that the two pages originate from the same codex, and ranks all such pairs in descending order of their associated posterior probability. The scholar can thus discover yet unknown pairs of pages originating from the same codex by examining, starting from the top of the list, the pairs proposed by ReCoptic. In experiments that we have run on a collection of 6,000+ pages of Coptic manuscripts, ReCoptic displays extremely high accuracy. The code for reproducing these experiments is available at https://github.com/lorebianchi98/ReCoptic

Dal pattern alla struttura. La visualizzazione interpretativa dei dati nelle Digital Humanities

AUTHORS: Marcello Costa, Chiara Palillo, Cinzia Ferrara,

Work Package : WP8 uBIQUity

URL: https://iris.unipa.it/handle/10447/696346?mode=simple

Keywords: data visualization, design for community, digital humanities, religious studies

Abstract
The paper explores the intersection between Digital Humanities (DH) and Plural Design in the context of interpretative data visualization for the comparative analysis of religious texts. The aim is to balance quantitative approaches for identifying patterns with qualitative ones that enrich the humanistic interpretation of structures within the field of religious studies. In this perspective, the uBIQUity project is introduced as a platform designed to detect intertextual references within sacred texts. The platform seeks to develop tools that enhance human observational capabilities and foster a richer, more participatory engagement with religious texts, in line with an inclusive and transdisciplinary approach. The expected benefits for the DH community will extend to a broader audience, contributing to countering misinformation, fundamentalism, and all forms of discrimination arising from the manipulation and distortion of knowledge.

Dalla terra al cielo? La civitas Dei di Agostino tra eredità classica e visione escatologica

AUTHORS: Fabio Tutrone

Work Package : WP8 – uBIQUity

URL:https://www.academia.edu

Keywords: Augustine, De civitate Dei, utopias, late antique Christianity, ancient and modern political thought, eschatology, biblical intertextuality, classical receptions, Cicero, Platonism, Stoicism

Abstract
This article approaches Augustine’s treatise On the City of God (De civitate Dei) as a pivotal work situated between classical political thought and Christian eschatology, arguing that its enduring influence on Western utopian paradigms cannot be reduced to later secular reinterpretations. Rather than proposing an ascetic withdrawal from civic life, Augustine articulates a dialectical vision in which the earthly and the heavenly cities coexist in the temporal realm, compelling Christians to engage actively in the sociopolitical order while orienting their interior life toward an eschatological fulfilment. Through a close reading of Books 10, 19, and 22, the present study highlights Augustine’s complex reception of Greco-Roman traditions—particularly Platonism, Stoicism, and Cicero’s philosophica—and the ways in which biblical intertextuality reshapes classical models of community, justice, and moral progress. Augustine’s civitas Dei emerges as a transitional utopia: simultaneously present and incomplete, grounded in human experience yet projected toward the final consummation of history—as a stratified intertextual (and intercultural) ensemble, which aims to transform ancient theories of the ideal community into a dynamic vision of ethical relationality and political responsibility, laying the foundations for medieval and early modern conceptions of the perfect society.

From Data to Interpretation: Designing Modular Visualizations for Capta in Digital Humanities

AUTHORS: Marcello Costa, Cinzia Ferrara, Daniele Savasta, Chiara Palillo

Work Package : WP 8

URL: https://iris.unipa.it/handle/10447/696347

Keywords: Text Data Visualization, Digital Humanities, Religious Studies, VARK, uBIQUity

Abstract
Digital Humanities represent a continuously evolving research field where the integration of computational methods and the humanities is redefining the interpretative paradigms of knowledge. This paper analyzes the role of Data Visualization in humanities research, with particular focus on the field of Religious Studies. Through the uBIQUity project, developed within ITSERR, innovative strategies for visualizing intertextual references in sacred texts are explored, offering a space for reflection on challenges, failures, and new design opportunities. In this context, the transdisciplinary approach, combining both quantitative and qualitative analysis, emphasizes the need to transcend quantitative visual conventions in favor of more flexible and inclusive interpretative models. The research proposes a methodology based on the design of modular and dynamic visualizations capable of representing the complexity of humanistic phenomena and enhancing the accessibility of religious textual heritage.

Information Design and Data Visualization in the ITSERR project

AUTHORS: Marcello Costa, Chiara Palillo

Work Package : WP 8

URL: https://iris.unipa.it/handle/10447/686393

Keywords: RELIGIOUS STUDIES, DIGITAL HUMANITIES, DATA VISUALIZATION, DIGITAL ARCHIVES, UX RESEARCH

Abstract
Resilience, as the first European research infrastructure for Religious Sciences, establishes its foundational framework through the collaborative efforts of the ITSERR research team, – Italian Strengthening of the ESFRI RI Resilience. The objective of this project is to develop new visualization tools and systems aimed at providing support to the scientific community engaged in Digital Humanities for Religious Studies. Employing an experimental and interdisciplinary design approach within the ITSERR ecosystem, the research explores visual strategies in Information Design and Data Visualization. The aim is to preserve both tangible and intangible religious corpora’s heritage. To analyze visualization systems created by Christian and Islamic scholars in theological, historiographical, scientific, and graphic-visual contexts, it is necessary to study several infographic representations and their evolution over time. According to the academic perspective, De Christiana Doctrina by St. Augustine emphasizes the necessity of combining exegetical examination of sacred texts with historiographical and scientific pursuits (Brotton, 2017). An essential tool for providing support and guidance through the phases of research and graphic design is a multidisciplinary glossary that can be established as a collaborative space accommodating various disciplines. Within the multidisciplinary milieu of ITSERR, the integration of qualitative techniques in the design process is deemed essential for an exhaustive comprehension of individuals’ activities and the influence of environmental factors, as articulated by Norman (2019). Methodologies such as structured interviews, focus groups, and VARK questionnaires serve as guiding tools, steering designers towards creating designs that facilitate a more congenial learning experience for users through dynamic and multichannel audio-visual modalities. As a result, the ITSERR project’s amalgamation of concepts from history, theology, and science creates a complex network of data and information. The translation of this intricate web requires the use of strategic communication design methodologies.

Strumenti di ricerca per le Digital Humanities. Riconfigurare lo spazio dell’informazione Ubiquity. Il design della comunicazione nel progetto ITSERR

AUTHORS: Marcello Costa, Cinzia Ferrara, Chiara Palillo

Work Package : WP 8

URL: https://iris.unipa.it/handle/10447/693263

Keywords: data visualization, interaction design, digital humanities, AI, LLM

Abstract
Il contributo esplora le intersezioni transdisciplinari tra le Digital Humanities (DH) e la data visualization, concentrandosi sui processi di analisi testuale automatizzata, supportati da algoritmi di intelligenza artificiale (IA). Ha l’obiettivo non solo di riflettere sulle forme di rappresentazione di dati e fenomeni culturali complessi tramite sistemi di visualizzazione interattivi, ma anche di sottolineare l’importanza dell’intervento umano all’interno di tali processi automatizzati, al fine di promuovere e condividere un approccio interpretativo e soggettivo alla conoscenza. Uno degli obiettivi principali della data visualization è rappresentare in modo chiaro ed efficace un dataset, trasformando dati numerici in elementi visivi e interattivi per comunicare ed esplorare la complessità di un fenomeno. Tuttavia, in questo processo di traduzione, si rischia talvolta di perdere di vista le logiche interpretative e le modalità di estrazione dei dati relativi al fenomeno osservato, attribuendo erroneamente un carattere di oggettività alla rappresentazione grafica risultante.

uBIQUity and Resilient Septuagint: Transdisciplinary and Interrelated Research Projects on Sacred Texts and their Receptions

AUTHORS: Laura Bigoni, Davide Dainese, Anna Mambelli,

WORK PACKAGE: WP8

URL: https://iris.unimore.it/handle/11380/1389090

Keywords: Bible; Septuagint; Ancient Christian Literature; Intertextuality; Digital Humanities; Artificial Intelligence; Large Language Models

Abstract

uBIQUity and Resilient Septuagint are transdisciplinary research projects linked by common aims, which originate from the methodological framework of the Historical and Theological Lexicon of the Septuagint (ed. by Eberhard Bons and Daniela Scialabba, in collaboration with Anna Mambelli; 4 vols., Tübingen: Mohr Siebeck, 2020–). Resilient Septuagint, an Italian Research Project of Relevant National Interest (PRIN 2022), focuses on the semantics of “killing” and “healing” in the Septuagint (Qo/Eccl 3:3) and its reception in Patristic and Late Antique sources (3rd cent. BCE – 5th cent. CE). uBIQUity, which incorporates the “BI” of the Bible(s) and the “QU” of the Qur’ān in its title, is part of the larger PNRR project ITSERR – Italian Strengthening of the ESFRI RI RESILIENCE funded by European Union-NextGenerationEU (NRRP, Mission 4, Component 2, Project Code: IR0000014, CUP: B53C22001770006), and aims at investigating the sacred texts of Chrstianity and Islam in different environments and historical periods through two huge corpora: Greek and Latin Christian commentaries on the Bible(s) written from the Patristic age until the Late Byzantine period, and classical commentaries on the Qur’ān written in Arabic (tafāsīr) from the rise of Islam until the 15th century. By interweaving Computer Science and the Humanities, uBIQUity aims to develop a new research tool that can identify with a high degree of accuracy quotations/allusions to the Bible(s) and the Qur’ān in Christian and Islamic commentaries. Intertextual references, conscious or unconscious, work as invisible “places of memory”, making the sacred texts “ubiquitous”.

Shooting Stars against the Ğinn: Use and Reuse of a Ḥadīṯ in the Exegesis of Q 46:29

AUTHORS: Sara Abram

WORK PACKAGE: WP8

URL: https://iris.unipa.it/handle/10447/695163

Keywords: ǧinn, ḥadīth, tafsīr, quran, al-Thaʿlabī, al-Ṭabarī, Muqātil, texte reuse, exegesis.

Abstract

This paper investigates the use of a ḥadīṯ within a corpus of medieval Qurʾānic commentaries (tafāsīr) concerning the mysterious encounter between Muḥammad and a group of ǧinn (Q 46:29; 72:1). A comparative analysis shows that ʾAbū ʾIsḥāq al-Ṯaʿlabī (d. 1035) chooses a version that aligns more closely with earlier exegeses than with those found in canonical ḥadīṯ collections (e.g., the Ṣaḥīḥayn). His version closely resembles, in both content and form, the accounts found in the Tafsīr by al-Ṭabarī (d. 923) and, indirectly, a transmission line traceable at least as far back as the Tafsīr by Muqātil b. Sulaymān (d. 767). This study highlights the active role of the Qurʾānic commentators in selecting and reshaping inherited material for interpretive purposes, contributing to a broader reflection on the relationship between tafāsīr and ḥadīṯ literature.

Exploring the uBIQUity of Biblical Texts: Tradition and Innovation in the Ancient and Digital Worlds

AUTHORS: Anna Mambelli, Marcello Costa

WORK PACKAGE: WP8

URL: https://iris.unimore.it/handle/11380/1389232?mode=full

Keywords: Biblical Studies; Computer Science; uBIQUity; ITSERR; Bible; Quran; Ancient Christian and Islamic Exegetical Works; Digital Humanities; Intertextuality; Ancient Christian Literature; Vector-semantic Embedding Models; Variant Readings; Node-based User Interface

Abstract

The uBIQUity project operates at the intersection of Biblical studies and computer science to investigate the presence and reception of sacred texts (the Bible[s] and the Qurʾān) in ancient Christian and Islamic exegetical works. Reversing the usual perspective, the project shows how the philological rigor of Biblical studies offers a distinctive contribution to the field of digital humanities. The main goal is to develop an innovative semantic search engine capable of identifying intertextual references with a high degree of accuracy. Methodologically, uBIQUity combines token-based and language-aware techniques with state-of-the-art vector-semantic embedding models. A core innovation is the rigorous classification system designed to quantify and qualify textual similarity, moving beyond the dichotomy between “quotation” and “allusion”. This system incorporates textual granularity by meticulously analyzing variant readings found in the critical apparatus of the modern editions of ancient biblical versions. The project employs user experience design principles to develop a flexible, node-based user interface (*node-based UI*) that aligns with the non-linear cognitive styles of humanistic scholars. uBIQUity is an example of transdisciplinary research that generates scientific innovation rooted in tradition.

The Biblical Heritage in Ancient Latin Christian Literature: Advancing Intertextual Mapping Through Sentence Embeddings

AUTHORS: Anna Mambelli, Laura Bigoni, Davide Dainese, Fabio Tutrone , Davide Caffagni, Federico Cocchi, Marco Zanella, Marcella Cornia, Rita Cucchiara

WORK PACKAGE: WP8

URL: https://umanisticadigitale.unibo.it/article/view/22160

Keywords: Latin Bibles, Latin Patristics, Intertextuality, BERT-based Sentence Embeddings, IRCDL2025

Abstract

This study presents an interdisciplinary methodology for detecting intertextual references in Latin patristic literature through a novel combination of philological rigor and Natural Language Processing (NLP) techniques. Focusing on Augustine of Hippo’s De Genesi ad litteram and its relationship to Latin biblical texts (specifically Jerome’s Vulgate and pre-Vulgate versions), this research introduces a token-based classification system enriched with semantic annotations, supported by the INCEpTION platform. The classification system accounts for exact matches, lemmatized forms, synonyms, and structural parallels, capturing a wide spectrum of textual similarity. To enhance automatic retrieval of these intertextual links, we fine-tune BERT-based language models for Latin, incorporating contrastive learning and hard negative mining. Experimental results demonstrate that fine-tuned models significantly outperform baselines across varying levels of textual similarity. This work highlights the utility of computational models in bridging explicit citations and implicit allusions, offering a scalable approach for the study of biblical intertextuality in ancient texts.

Non-Linear Thinking Processes in Digital Humanities

AUTHORS: Marcello Costa

WORK PACKAGE: WP8

URL: https://iris.unipa.it/handle/10447/686443?mode=full

Keywords: Data Visualization, Digital Humanities, Interaction Design, Node-Based UI, Artificial Intelligence

Abstract

This paper investigates the benefits of node-based user interfaces in Digital Humanities (DH). It highlights the importance of a qualitative versus a quantitative approach for text comparisons generated by Artificial Intelligence (AI). Since thinking processes in text analysis are lost in generative outputs within current AI text-based User Interfaces (UI), keeping track of them allows for an interpretive approach. Within the fields of User Experience (UX) and Data Visualization Design we are developing uBIQUity, a platform that identifies intertextual references in Christian and Islamic sacred texts. The platform uses a node-based UI to manage, display data, and compare references. It emphasizes collaborative knowledge-building, and supports different cognitive learning styles. By combining philology, computer science, computational linguistics, and design, it overcomes the limitations of tools based solely on quantitative analysis, offering flexible interactions with digitized texts.

Ubiquity Platform. Data Visualization for Digital Humanities

AUTHORS: Marcello Costa

WORK PACKAGE: WP8

URL: https://iris.unipa.it/handle/10447/686443?mode=full

Keywords: LLM, Data Visualization, UX/UI, Digital Humanities

Abstract

The purpose of this paper is to demonstrate how node-based UI can overcome the limitations of AI text-based UI in terms of user experience and cognitive overload. It focuses on the development of Ubiquity, a digital platform that can enhance automatic text comparison in the field of religious studies. The platform’s objective is to produce new interpretations of the Scriptures and reconstruct the collective memory of religious communities over time. The study highlights the transdisciplinary team’s pivotal role in balancing the analytical approach of computer science with the interpretive approach of humanities. Ubiquity will assist researchers in formulating new research questions and keeping track of their reasoning. Instead of substituting humans, Ubiquity will use AI as a powerful aide.

Generating Synthetic Data with LargeLanguage Models for Low-Resource Sentence Retrieval

AUTHORS: Davide Caffagni, Federico Cocchi, Anna Mambelli, Fabio Tutrone, Marco Zanella, Marcella Cornia,
Rita Cucchiara

WORK PACKAGE: WP8

URL: https://link.springer.com/chapter/10.1007/978-3-032-05409-8_4

Keywords: etrieval language-model synthetic-data

Abstract

Sentence similarity search is a fundamental task in information retrieval, enabling applications such as search engines, question answering, and textual analysis. However, retrieval systems often struggle when training data are scarce, as is the case for low-resource languages or specialized domains such as ancient texts. To address this challenge, we propose a novel paradigm for domain-specific sentence similarity search, where the embedding space is shaped by a combination of limited real data and a large amount of synthetic data generated by Large Language Models (LLMs). Specifically, we employ LLMs to generate domain-specific sentence pairs and fine-tune a sentence embedding model, effectively distilling knowledge from the LLM to the retrieval model. We validate our method through a case study on biblical intertextuality in Latin, demonstrating that synthetic data augmentation significantly improves retrieval effectiveness in a domain with scarce annotated resources. More broadly, our approach offers a scalable and adaptable framework for enhancing retrieval in domain-specific contexts. Source code and trained models are available at https://github.com/aimagelab/biblical-retrieval-synthesis.

Transductive model selection under prior probability shift

AUTHORS: Lorenzo Volpi, Alejandro Moreo, and Fabrizio Sebastiani

WORK PACKAGE: WP8

URL: https://arxiv.org/abs/2512.04759

Keywords: Model selection, Hyperparameter optimisation, Classifier accuracy prediction, Dataset shift, Prior probability shift, Transductive learning

Abstract

Transductive learning is a supervised machine learning task in which, unlike in traditional inductive learning,
the unlabelled data that require labelling are a finite set and are available at training time. Similarly to inductive
learning contexts, transductive learning contexts may be affected by dataset shift, i.e., may be such that the
assumption according to which the training data and the unlabelled data are independently and identically
distributed (IID), does not hold. We here propose a method, tailored to transductive classification contexts,
for performing model selection (i.e., hyperparameter optimisation) when the data exhibit prior probability
shift, an important type of dataset shift typical of anti-causal learning problems. In our proposed method
the hyperparameters can be optimised directly on the unlabelled data to which the trained classifier must be
applied; this is unlike traditional model selection methods, that are based on performing cross-validation on the
labelled training data. By tailoring model selection to the actual test distribution, our approach contributes to
the trustworthiness of AI systems, as it enables more reliable and robust classifier deployment under changed
conditions. We provide experimental results that show the benefits brought about by our method.