WP4 – DaMSym
ABSTRACT
WP4 developed DaMSym, a semantic search tool for historical languages, applied as a case study to translations of the Nicene-Constantinopolitan Creed. It is designed to investigate the semantic depth of key theological formulations and religious concepts across Greek, Latin, Arabic, Sanskrit, and Old Church Slavonic traditions. The tool enables retrieval beyond verbatim matches and is designed to work with texts of any genre, thus enabling multi-context use.
RESULTS AND TOOLS
DaMSym is a semantic search system that goes beyond traditional exact word matching, identifying conceptually related passages even when they are expressed through different terms and linguistic structures. The system relies on transformer‑based sentence embeddings, selected and tested by domain experts for each language, to map sentences into a shared semantic space and retrieve semantically related passages based on conceptual proximity. It transforms sentences into mathematical representations of meaning, allowing users to measure conceptual proximity across texts and to retrieve semantically related passages across corpora. The platform currently operates on five linguistic traditions (Greek, Latin, Arabic, Sanskrit, Old Church Slavonic) and, for Greek and Latin, supports cross-lingual semantic search, enabling queries in one language to retrieve results from both.
In addition to the core system, WP4 produced supporting tools and resources: a DIACU dataset for Old Church Slavonic with tools for character standardisation and lemmatisation, Sanskrit processing tools for compound segmentation and lexical search, and Gretino, the first benchmark dataset designed for the evaluation of semantic retrieval in Ancient Greek and Latin, including synthetic data and expert annotations. Together, these components provide an interoperable and reusable infrastructure for experimenting with semantic technologies in ancient and historical languages. DaMSym thus marks a shift from keyword‑based search to full semantic retrieval for low‑resource ancient languages, turning static corpora into conceptually searchable networks of texts.
CASE STUDIES
DaMSym has been applied to the analysis of the Nicene-Constantinopolitan Creed across multiple linguistic traditions, enabling the comparison of religious concepts across languages and historical contexts. This case study shows how semantic retrieval can be used to trace shifts in meaning, emphasis, and doctrinal nuance in the translation and reception of a core Christian text.
TEAM
WP4 was coordinated by Fabrizio D’Avenia (University of Palermo). Development activities were led by Costanza Bianchi (FSCIRE) and Marianna Napolitano (University of Modena and Reggio Emilia and FSCIRE) as Product Owners. Contributors included Igor Spanò and Irfan Ali (University of Palermo), working on Sanskrit semantic retrieval, Ivana Panzeca (University of Palermo), working on Arabic semantic retrieval, Usman Nawaz (University of Palermo) whose research focused on the standardization of characters and the lemmatization of Old Church Slavonic, Federico Iezzi and Elia Scapini (PhD candidates in Religious Studies – DREST, University of Modena and Reggio Emilia and FSCIRE) who contributed to semantic retrieval activities for Greek and Latin texts and to the development of training materials (video tutorials and training sessions). The team was further expanded through the participation of the Centro Nazionale (ISTI-CNR, Pisa), represented by Giovanni Puccetti, who collaborated on and coordinated the NLP-related activities concerning Church Slavonic, Latin, Greek, and Arabic. The preparatory work — specifically, until December 2024 for Greek and until June 2025 for Latin — was funded under WP6. Research activities in the humanities related to Church Slavonic are supported under WP3 (Criterion).
The team combined expertise in religious studies, computational linguistics, and classical and oriental philologies, ensuring that technological development remained closely aligned with scholarly use cases.
BEYOND ITSERR
The technologies developed in WP4 are applicable to any research field working with historical texts in ancient languages. DaMSym enables the identification of conceptual connections across texts from different periods and traditions, with applications in philology, history, philosophy, and legal studies. Supporting resources – including tools for Old Church Slavonic lemmatization and character standardization, Sanskrit segmentation, and the Gretino evaluation dataset – constitute interoperable and reusable resources for the international scholarly community. In this way, WP4 provides a foundation for future research infrastructures that require robust semantic access to multilingual historical corpora.