ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation

AUTHORS: Giulio Federico, Giuseppe Amato, Fabio Carrara, Claudio Gennaro, Marco Di Benedetto

WORK PACKAGE:

URL: ViSketch-GPT: Collaborative Multi-Scale Feature Extraction for Sketch Recognition and Generation

Keywords:

Abstract
Understanding the nature of human sketches is challenging because of the wide variation in how they are created. Recognizing complex structural patterns improves both the accuracy in recognizing sketches and the fidelity of the generated sketches. In this work, we introduce ViSketch-GPT, a novel algorithm designed to address these challenges through a multi-scale context extraction approach. The model captures intricate details at multiple scales and combines them using an ensemble-like mechanism, where the extracted features work collaboratively to enhance the recognition and generation of key details crucial for classification and generation tasks.
The effectiveness of ViSketch-GPT is validated through extensive experiments on the QuickDraw dataset. Our model establishes a new benchmark, significantly outperforming existing methods in both classification and generation tasks, with substantial improvements in accuracy and the fidelity of generated sketches.
The proposed algorithm offers a robust framework for understanding complex structures by extracting features that collaborate to recognize intricate details, enhancing the understanding of structures like sketches and making it a versatile tool for various applications in computer vision and machine learning.




Evaluation of event plausibility recognition in Large (Vision)-Language Models

AUTHORS: Maria Cassese, Alessandro Bondielli, Alessandro Lenci

WORK PACKAGE:

URL: Evaluation of event plausibility recognition in Large (Vision)-Language Models

Keywords:

Abstract
Transformer-based Language Models (LMs) achieve outstanding performances in various tasks but still exhibit limitations in recognizing common world events (GEK), particularly when they require referential information or real-world experience. Assuming that visual knowledge in vision-language models (VLMs) provides additional referential information, this paper tests their ability to leverage implicit event knowledge to acquire robust and generalizable representations of agent-patient interactions, assessing their capacity to distinguish between plausible and implausible events. The analysis was conducted on models of varying sizes and architectures.
In the evaluation, the performance of unimodal and multimodal models of various sizes was compared using the task of recognizing the plausibility of minimal sentence pairs. Our analysis suggests several findings: 1) decoder-only models tend to outperform encoder-only ones; 2) the model size has a minor impact: although larger models perform better in absolute terms, the differences between 7B and 13B parameter models are not significant for this particular task; 3) while smaller encoder-only VLMs consistently fall short of their LLM counterpart, larger ones have similar or slightly superior performance; 4) all models have lower performance on the more challenging sentences; 5) adding corresponding images to the textual stimuli affects the accuracy levels of some models. These findings open avenues for further analyses of the inner workings of VLMs and their ability to model event knowledge with and without visual inputs.




Esempi di applicazione dei chatbot allo studio del Talmud Babilonese

AUTHORS: Andrea Ravasco

WORK PACKAGE: WP3 –T-Res

URL: Rivisteweb: Andrea Ravasco, Esempi di applicazione dei chatbot allo studio del “Talmud babilonese”

Keywords: Talmud – Mishnah – Rosh Hashanah – Intelligenza Artificiale – Chatbot – ChatGPT – Gemini

Abstract
Le intelligenze artificiali sono utilizzate nella ricerca umanistica e negli studi ebraici in particolare attraverso alcuni progetti. Questo articolo ha lo scopo di rispondere alla domanda se i “chatbot” più popolari, ovvero ChatGPT e Gemini, possano essere considerati un valido strumento per l’analisi delle fonti all’interno del “Talmud babilonese”. I “chatbot” sono valutati sotto diversi aspetti: conoscenza dell’aramaico talmudico, capacità di trovare riferimenti ad altre fonti all’interno del “Talmud babilonese”, capacità di riconoscere le numerose fonti.




Decoding the Divine. W.H. Mill’s Sanskrit Rendition of the Nicene-Constantinopolitan Creed and Its Religious and Linguistic Resonance

AUTHORS: Igor Spanò

WORK PACKAGE: WP4 DamSym

URL:

Keywords: W.H. Mill, Hindu-Christian Relations, Sanskrit Christian Creed, Comparative Religions, Comparative Philosophies

Abstract
Starting from some historical and historiographical reflections, which constitute the necessary contextualisation, this article proposes an investigation into some verses of the Sanskrit translation of the Nicene-Constantinopolitan Creed by William Hodge Mill in 1823. The aim is to explore some of Mill’s translation choices and frame the lexicon used in a broad scenario that considers the terms’ philological analysis and semantic history within the Indian historical-religious framework.




QuAcc: Using Quantification to Predict Classifier Accuracy Under Prior Probability Shift

AUTHORS:  Lorenzo Volpi, Alejandro Moreo, Fabrizio Sebastiani

WORK PACKAGE: WP8 UbiQuity

URL:https://journals.sagepub.com/doi/abs/10.1177/17248035251338347

Keywords:

Abstract
Using cross-validation to predict the accuracy of a classifier on unseen data can be done reliably only in the absence of dataset shift, i.e., when the training data and the unseen data are IID. In this work we deal instead with the problem of predicting classifier accuracy on unseen data affected by prior probability shift (PPS), an important type of dataset shift. We propose QuAcc, a method built on top of “quantification” algorithms robust to PPS, i.e., algorithms devised for estimating the prevalence values of the classes in unseen data affected by PPS. QuAcc is based on the idea of viewing the cells of the contingency table (on which classifier accuracy is computed) as classes, and of estimating, via a quantification algorithm, their prevalence values on the unseen data labelled by the classifier. We perform systematic experiments in which we compare the prediction error incurred by QuAcc with that of state-of-the-art classifier accuracy prediction (CAP) methods.