ABSTRACT
Digital Maktaba is the ITSERR tool for the semi-automatic cataloguing of bibliographic collections in non-Latin scripts, with particular attention to Arabic-script collections and Islamic studies. It supports librarians and researchers in managing multilingual and multi-script heritage, improving access to cultural resources and reducing exclusive reliance on transliteration when describing and searching collections.
RESULTS AND TOOLS
Digital Maktaba is an operational prototype for the semi-automatic cataloguing of Arabic-script texts, tested on specialised Islamic studies collections and validated through real use cases in library contexts. The system includes annotated datasets for improving OCR and document classification in non-Latin scripts, automated workflows for metadata extraction (title, author, publication data, thematic categories), and advanced search functionalities across multilingual collections. These components form a reusable and extensible framework that can support both retrospective cataloguing and the management of newly acquired materials, integrating OCR, language models, and visual analysis in a single pipeline.
CASE STUDIES
Digital Maktaba has been developed and tested in close collaboration with the Biblioteca La Pira (Palermo) and other specialised Islamic studies collections, which serve as real-world case studies for the full digitisation and cataloguing workflow. In these contexts, the tool has been used for OCR of Arabic-script title pages (using the DM‑LP dataset and KrakenOCR), metadata recovery via ISBN, automatic classification based on language models, and semantic enrichment of categories, directly supporting day-to-day library work on non-Latin collections.
TEAM
WP5 was coordinated by Fabrizio D’Avenia (University of Palermo), with Sonia Bergamaschi (University of Modena and Reggio Emilia) and Federico Ruozzi (University of Modena and Reggio Emilia–FSCIRE) as product owners.
The team brought together expertise in computer engineering, cultural data management, Islamic studies, and digital librarianship, with contributions from Sania Aftar (University of Modena and Reggio Emilia), Domenico Beneventano (University of Modena and Reggio Emilia), Domenico Ciccarello (University of Palermo), Amina El Ganadi (University of Palermo, FSCIRE), Luca Sala (University of Modena and Reggio Emilia), Giovanni Sullutrone (University of Modena and Reggio Emilia, and Riccardo Vigliermo (FSCIRE).
This interdisciplinary configuration ensured that technical solutions were co-designed with practitioners responsible for managing collections and implementing cataloguing standards.
BEYOND ITSERR
Thanks to its modular architecture, the system can be adapted to cataloguing materials in various non-Latin scripts (such as Persian, or Sanskrit), addressing the needs of libraries and archives managing complex multilingual collections. The technologies developed, particularly for OCR and automated metadata extraction, are also applicable in digital humanities, cultural heritage digitisation, and semantic access tools. By aligning with international cataloguing practices and multilingual recommendations, and by prioritising the use of original scripts over transliteration, Digital Maktaba can contribute to interoperable ecosystems for the management and valorisation of global collections, promoting more inclusive, scalable, and sustainable approaches to cultural heritage description and discovery.