Critical Digital Literacy
The promotion of digital literacy combined with critical thinking (SubProject#2) is arguably the most efficient way to tackle “fake news” and disinformation online. It has been proven to work in Finland, where five years after introducing such programs, the country has declared victory in the fight against “fake news”. Inspired by Finland’s example, this sub-project aims at promoting critical digital literacy in Qatar. We plan to achieve this through a general media literacy platform that would teach citizens and residents of Qatar how to recognize “fake news” and propaganda techniques. The platform will have lessons and exploration capabilities. It will feature tools to analyze news, social media posts, or any custom text in Arabic and English, and it would make explicit the propaganda/persuasion techniques of the discussed issues.
The tool will look for persuasion techniques such as appeal to emotions (e.g., fear, prejudices, smears, etc.) as well as logical fallacies (e.g., black & white fallacies, bandwagon, etc.). By interacting with the platform, users will become aware of the ways they can be manipulated by “fake news”, and thus they would be less likely to act based on it and also less likely to share it further, which is critical for limiting the potential impact of organized disinformation campaigns online. We will further study the role of critical digital literacy on people’s resilience to online manipulation and influence, whether legitimate, e.g., in e-commerce, or malicious, e.g., in social engineering and phishing. This literacy will also cover understanding the influence of the algorithms and the designs used in digital media, i.e., we will go beyond the literacy of how to recognize threats and how to respond to them to understand the underlying mechanics of influence and deception online.
It can be a valid argument, typically made by social media companies, that people shall be primarily responsible for managing their traits, weaknesses, worries, stress, and jealousy, whether in physical or online worlds. Generally, self-regulation is expected from users of social media. However, we argue that social media design can become too immersive and, at times, addictive. Hence, we argue that social media shall reduce triggers leading to a loss of control over their usage. Digital addiction is associated with reduced productivity and distracting sleep. Fear of missing out (FoMO) is one manifestation of how users become overly preoccupied with online spaces. We have argued that a thoughtful design process shall equip users with tools to manage it, e.g. creative versions of auto-reply, coloring schemes, and filters. Such a design can benefit those who are highly susceptible to peer pressure and possess low impulse control.
Objectives
To build a high-quality corpus annotated with propaganda and its techniques.
To develop a system for detecting the use of propaganda and its techniques in text in Arabic and English with a focus on Qatar and social media.
To develop an online platform for teaching critical digital literacy and then use the platform to study the role of critical digital literacy on people’s resilience to online manipulation and influence.
Meet Critical Digital Literacy Team members...
FIROJ ALAM
Critical Digital Literacy
WAJDI ZAGHOUANI
GEORGE MIKROS
GIOVANNI DA SAN MARTINO
MARAM HASANAIN
FATEMA AHMAD
ELISA SARTORI
University of Padova
MUAADH NOMAN
Publications
Noman, Muaadh; Almourad, Mohamed B; Yankouskaya, Ala; Alam, Firoj; Ali, Raian
In what style shall I confront them? The role of social relationships in social correction of misinformation among the UK and Arab Social media users Journal Article
In: International Journal of Intercultural Relations, vol. 111, pp. 102342, 2026.
@article{noman2026style,
title = {In what style shall I confront them? The role of social relationships in social correction of misinformation among the UK and Arab Social media users},
author = {Muaadh Noman and Mohamed B Almourad and Ala Yankouskaya and Firoj Alam and Raian Ali},
year = {2026},
date = {2026-01-01},
urldate = {2026-01-01},
journal = {International Journal of Intercultural Relations},
volume = {111},
pages = {102342},
publisher = {Elsevier},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Hasanain, Maram; Hasan, Md Arid; Kmainasi, Mohamed Bayan; Sartori, Elisa; Shahroor, Ali Ezzat; Martino, Giovanni Da San; Alam, Firoj
PropXplain: Can LLMs Enable Explainable Propaganda Detection? Proceedings Article
In: Christodoulopoulos, Christos; Chakraborty, Tanmoy; Rose, Carolyn; Peng, Violet (Ed.): Findings of the Association for Computational Linguistics: EMNLP 2025, pp. 23855–23863, Association for Computational Linguistics, Suzhou, China, 2025, ISBN: 979-8-89176-335-7.
@inproceedings{hasanain-etal-2025-propxplain,
title = {PropXplain: Can LLMs Enable Explainable Propaganda Detection?},
author = {Maram Hasanain and Md Arid Hasan and Mohamed Bayan Kmainasi and Elisa Sartori and Ali Ezzat Shahroor and Giovanni Da San Martino and Firoj Alam},
editor = {Christos Christodoulopoulos and Tanmoy Chakraborty and Carolyn Rose and Violet Peng},
url = {https://aclanthology.org/2025.findings-emnlp.1296/},
doi = {10.18653/v1/2025.findings-emnlp.1296},
isbn = {979-8-89176-335-7},
year = {2025},
date = {2025-11-01},
urldate = {2025-11-01},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
pages = {23855–23863},
publisher = {Association for Computational Linguistics},
address = {Suzhou, China},
abstract = {There has been significant research on propagandistic content detection across different modalities and languages. However, most studies have primarily focused on detection, with little attention given to explanations justifying the predicted label. This is largely due to the lack of resources that provide explanations alongside annotated labels. To address this issue, we propose a multilingual (i.e., Arabic and English) explanation-enhanced dataset, the first of its kind. Additionally, we introduce an explanation-enhanced LLM for both label detection and rationale-based explanation generation. Our findings indicate that the model performs comparably while also generating explanations. We will make the dataset and experimental resources publicly available for the research community (https://github.com/firojalam/PropXplain).},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Zaghouani, Wajdi; Biswas, Md. Rafiul; Bessghaier, Mabrouka; Ibrahim, Shimaa; Mikros, George; Hasnat, Abul; Alam, Firoj
MAHED Shared Task: Multimodal Detection of Hope and Hate Emotions in Arabic Content Proceedings Article
In: Darwish, Kareem; Ali, Ahmed; Farha, Ibrahim Abu; Touileb, Samia; Zitouni, Imed; Abdelali, Ahmed; Al-Ghamdi, Sharefah; Alkhereyf, Sakhar; Zaghouani, Wajdi; Khalifa, Salam; AlKhamissi, Badr; Almatham, Rawan; Hamed, Injy; Alyafeai, Zaid; Alowisheq, Areeb; Inoue, Go; Mrini, Khalil; Alshammari, Waad (Ed.): Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks, pp. 560–574, Association for Computational Linguistics, Suzhou, China, 2025, ISBN: 979-8-89176-356-2.
@inproceedings{zaghouani-etal-2025-mahed,
title = {MAHED Shared Task: Multimodal Detection of Hope and Hate Emotions in Arabic Content},
author = {Wajdi Zaghouani and Md. Rafiul Biswas and Mabrouka Bessghaier and Shimaa Ibrahim and George Mikros and Abul Hasnat and Firoj Alam},
editor = {Kareem Darwish and Ahmed Ali and Ibrahim Abu Farha and Samia Touileb and Imed Zitouni and Ahmed Abdelali and Sharefah Al-Ghamdi and Sakhar Alkhereyf and Wajdi Zaghouani and Salam Khalifa and Badr AlKhamissi and Rawan Almatham and Injy Hamed and Zaid Alyafeai and Areeb Alowisheq and Go Inoue and Khalil Mrini and Waad Alshammari},
url = {https://aclanthology.org/2025.arabicnlp-sharedtasks.75/},
doi = {10.18653/v1/2025.arabicnlp-sharedtasks.75},
isbn = {979-8-89176-356-2},
year = {2025},
date = {2025-11-01},
urldate = {2025-11-01},
booktitle = {Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
pages = {560–574},
publisher = {Association for Computational Linguistics},
address = {Suzhou, China},
abstract = {This paper presents the MAHED 2025 Shared Task on Multimodal Detection of Hope and Hate Emotions in Arabic Content, comprising three subtasks: (1) text-based classification of Arabic content into hate and hope, (2) multi-task learning for joint prediction of emotions, offensive content, and hate speech, and (3) multimodal detection of hateful content in Arabic memes. We provide three high-quality datasets totaling over 22,000 instances sourced from social media platforms, annotated by native Arabic speakers with Cohen's Kappa exceeding 0.85. Our evaluation attracted 46 leaderboard submissions from participants, with systems leveraging Arabic-specific pre-trained language models (AraBERT, MARBERT), large language models (GPT-4, Gemini), and multimodal fusion architectures combining CLIP vision encoders with Arabic text models. The best-performing systems achieved macro F1-scores of 0.723 (Task 1), 0.578 (Task 2), and 0.796 (Task 3), with top teams employing ensemble methods, class-weighted training, and OCR-aware multimodal fusion. Analysis reveals persistent challenges in dialectal robustness, minority class detection for hope speech, and highlights key directions for future Arabic content moderation research.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Bassi, Davide; Dimitrov, Dimitar Iliyanov; D'Auria, Bernardo; Alam, Firoj; Hasanain, Maram; Moro, Christian; Orrù, Luisa; Turchi, Gian Piero; Nakov, Preslav; Martino, Giovanni Da San
Annotating the Annotators: Analysis, Insights and Modelling from an Annotation Campaign on Persuasion Techniques Detection Proceedings Article
In: Che, Wanxiang; Nabende, Joyce; Shutova, Ekaterina; Pilehvar, Mohammad Taher (Ed.): Findings of the Association for Computational Linguistics: ACL 2025, pp. 17918–17929, Association for Computational Linguistics, Vienna, Austria, 2025, ISBN: 979-8-89176-256-5.
@inproceedings{bassi-etal-2025-annotating,
title = {Annotating the Annotators: Analysis, Insights and Modelling from an Annotation Campaign on Persuasion Techniques Detection},
author = {Davide Bassi and Dimitar Iliyanov Dimitrov and Bernardo D'Auria and Firoj Alam and Maram Hasanain and Christian Moro and Luisa Orrù and Gian Piero Turchi and Preslav Nakov and Giovanni Da San Martino},
editor = {Wanxiang Che and Joyce Nabende and Ekaterina Shutova and Mohammad Taher Pilehvar},
url = {https://aclanthology.org/2025.findings-acl.922/},
doi = {10.18653/v1/2025.findings-acl.922},
isbn = {979-8-89176-256-5},
year = {2025},
date = {2025-07-01},
urldate = {2025-07-01},
booktitle = {Findings of the Association for Computational Linguistics: ACL 2025},
pages = {17918–17929},
publisher = {Association for Computational Linguistics},
address = {Vienna, Austria},
abstract = {Persuasion (or propaganda) techniques detection is a relatively novel task in Natural Language Processing (NLP). While there have already been a number of annotation campaigns, they have been based on heuristic guidelines, which have never been thoroughly discussed. Here, we present the first systematic analysis of a complex annotation task -detecting 22 persuasion techniques in memes-, for which we provided continuous expert oversight. The presence of an expert allowed us to critically analyze specific aspects of the annotation process. Among our findings, we show that inter-annotator agreement alone inadequately assessed annotation correctness. We thus define and track different error types, revealing that expert feedback shows varying effectiveness across error categories. This pattern suggests that distinct mechanisms underlie different kinds of misannotations. Based on our findings, we advocate for an expert oversight in annotation tasks and periodic quality audits. As an attempt to reduce the costs for this, we introduce a probabilistic model for optimizing intervention scheduling.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Kmainasi, Mohamed Bayan; Shahroor, Ali Ezzat; Hasanain, Maram; Laskar, Sahinur Rahman; Hassan, Naeemul; Alam, Firoj
LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content Proceedings Article
In: Chiruzzo, Luis; Ritter, Alan; Wang, Lu (Ed.): Findings of the Association for Computational Linguistics: NAACL 2025, pp. 5642–5664, Association for Computational Linguistics, Albuquerque, New Mexico, 2025, ISBN: 979-8-89176-195-7.
@inproceedings{kmainasi-etal-2025-llamalens,
title = {LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content},
author = {Mohamed Bayan Kmainasi and Ali Ezzat Shahroor and Maram Hasanain and Sahinur Rahman Laskar and Naeemul Hassan and Firoj Alam},
editor = {Luis Chiruzzo and Alan Ritter and Lu Wang},
url = {https://aclanthology.org/2025.findings-naacl.313/},
doi = {10.18653/v1/2025.findings-naacl.313},
isbn = {979-8-89176-195-7},
year = {2025},
date = {2025-04-01},
urldate = {2025-04-01},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2025},
pages = {5642–5664},
publisher = {Association for Computational Linguistics},
address = {Albuquerque, New Mexico},
abstract = {Large Language Models (LLMs) have demonstrated remarkable success as general-purpose task solvers across various fields. However, their capabilities remain limited when addressing domain-specific problems, particularly in downstream NLP tasks. Research has shown that models fine-tuned on instruction-based downstream NLP datasets outperform those that are not fine-tuned. While most efforts in this area have primarily focused on resource-rich languages like English and broad domains, little attention has been given to multilingual settings and specific domains. To address this gap, this study focuses on developing a specialized LLM, LlamaLens, for analyzing news and social media content in a multilingual context. To the best of our knowledge, this is the first attempt to tackle both domain specificity and multilinguality, with a particular focus on news and social media. Our experimental setup includes 18 tasks, represented by 52 datasets covering Arabic, English, and Hindi. We demonstrate that LlamaLens outperforms the current state-of-the-art (SOTA) on 23 testing sets, and achieves comparable performance on 8 sets. We make the models and resources publicly available for the research community (https://huggingface.co/QCRI).},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Sartori, Elisa; Tardelli, Serena; Tesconi, Maurizio; Conti, Mauro; Galeazzi, Alessandro; Cresci, Stefano; Martino, Giovanni Da San; others,
Insights into using temporal coordinated behaviour to explore connections between social media posts and influence Proceedings Article
In: Findings of the Association for Computational Linguistics: EMNLP 2025, pp. 24392–24404, Association for Computational Linguistics 2025.
@inproceedings{sartori2025insights,
title = {Insights into using temporal coordinated behaviour to explore connections between social media posts and influence},
author = {Elisa Sartori and Serena Tardelli and Maurizio Tesconi and Mauro Conti and Alessandro Galeazzi and Stefano Cresci and Giovanni Da San Martino and others},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
pages = {24392–24404},
organization = {Association for Computational Linguistics},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Ruggeri, Federico; Muti, Arianna; Korre, Katerina; Struß, Julia Maria; Siegel, Melanie; Wiegand, M; Alam, F; Biswas, R; Zaghouani, W; Nawrocka, M; others,
Overview of the CLEF-2025 CheckThat! lab task 1 on subjectivity in news article Journal Article
In: Working Notes of CLEF, 2025.
@article{ruggeri2025overview,
title = {Overview of the CLEF-2025 CheckThat! lab task 1 on subjectivity in news article},
author = {Federico Ruggeri and Arianna Muti and Katerina Korre and Julia Maria Struß and Melanie Siegel and M Wiegand and F Alam and R Biswas and W Zaghouani and M Nawrocka and others},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
journal = {Working Notes of CLEF},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Bouamor, Houda; Iturra-Bocaz, Gabriel; Galuščáková, Petra; Alam, Firoj
Overview of the CLEF-2025 CheckThat! Lab Task 3 on Fact-Checking Numerical Claims Journal Article
In: 2025.
@article{bouamor2025overview,
title = {Overview of the CLEF-2025 CheckThat! Lab Task 3 on Fact-Checking Numerical Claims},
author = {Houda Bouamor and Gabriel Iturra-Bocaz and Petra Galuščáková and Firoj Alam},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Noman, Muaadh; Alam, Firoj; Ali, Raian
User Correction of Misinformation on Social Media: Exploring Communication Styles and the Influence of Empathy, Altruism, Self-esteem, and Competency Proceedings Article
In: Conference on e-Business, e-Services and e-Society, pp. 409–422, Springer 2025.
@inproceedings{noman2025user,
title = {User Correction of Misinformation on Social Media: Exploring Communication Styles and the Influence of Empathy, Altruism, Self-esteem, and Competency},
author = {Muaadh Noman and Firoj Alam and Raian Ali},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
booktitle = {Conference on e-Business, e-Services and e-Society},
pages = {409–422},
organization = {Springer},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Biswas, Md. Rafiul; Alam, Firoj; Zaghouani, Wajdi
MARSAD: A Multi-Functional Tool for Real-Time Social Media Analysis Miscellaneous
2025.
@misc{biswas2025marsadmultifunctionaltoolrealtime,
title = {MARSAD: A Multi-Functional Tool for Real-Time Social Media Analysis},
author = {Md. Rafiul Biswas and Firoj Alam and Wajdi Zaghouani},
url = {https://arxiv.org/abs/2512.01369},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Alam, Firoj; Struß, Julia Maria; Chakraborty, Tanmoy; Dietze, Stefan; Hafid, Salim; Korre, Katerina; Muti, Arianna; Nakov, Preslav; Ruggeri, Federico; Schellhammer, Sebastian; Setty, Vinay; Sundriyal, Megha; Todorov, Konstantin; V, Venktesh
The CLEF-2025 CheckThat! Lab: Subjectivity, Fact-Checking, Claim Normalization, and Retrieval Miscellaneous
2025.
@misc{alam2025clef2025checkthatlabsubjectivity,
title = {The CLEF-2025 CheckThat! Lab: Subjectivity, Fact-Checking, Claim Normalization, and Retrieval},
author = {Firoj Alam and Julia Maria Struß and Tanmoy Chakraborty and Stefan Dietze and Salim Hafid and Katerina Korre and Arianna Muti and Preslav Nakov and Federico Ruggeri and Sebastian Schellhammer and Vinay Setty and Megha Sundriyal and Konstantin Todorov and Venktesh V},
url = {https://arxiv.org/abs/2503.14828},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Abouzied, Azza; Alam, Firoj; Ali, Raian; Papotti, Paolo
Combating Misinformation in the Arab World: Challenges & Opportunities Journal Article
In: Communications of the ACM, 2025, (To appear).
@article{abouzied2025combating,
title = {Combating Misinformation in the Arab World: Challenges & Opportunities},
author = {Azza Abouzied and Firoj Alam and Raian Ali and Paolo Papotti},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
journal = {Communications of the ACM},
note = {To appear},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Kmainasi, Mohamed Bayan; Hasnat, Abul; Hasan, Md Arid; Shahroor, Ali Ezzat; Alam, Firoj
MemeIntel: Explainable Detection of Propagandistic and Hateful Memes Journal Article
In: arXiv preprint arXiv:2502.16612, 2025.
@article{kmainasi2025memeintel,
title = {MemeIntel: Explainable Detection of Propagandistic and Hateful Memes},
author = {Mohamed Bayan Kmainasi and Abul Hasnat and Md Arid Hasan and Ali Ezzat Shahroor and Firoj Alam},
url = {https://arxiv.org/abs/2502.16612},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
journal = {arXiv preprint arXiv:2502.16612},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Hasanain, Maram; Hasan, Md Arid; Kmainasi, Mohamed Bayan; Sartori, Elisa; Shahroor, Ali Ezzat; Martino, Giovanni Da San; Alam, Firoj
Reasoning About Persuasion: Can LLMs Enable Explainable Propaganda Detection? Journal Article
In: arXiv preprint arXiv:2502.16550, 2025.
@article{hasanain2025reasoning,
title = {Reasoning About Persuasion: Can LLMs Enable Explainable Propaganda Detection?},
author = {Maram Hasanain and Md Arid Hasan and Mohamed Bayan Kmainasi and Elisa Sartori and Ali Ezzat Shahroor and Giovanni Da San Martino and Firoj Alam},
url = {https://arxiv.org/pdf/2502.16550},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
journal = {arXiv preprint arXiv:2502.16550},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Alam, Firoj; Struß, Julia Maria; Chakraborty, Tanmoy; Dietze, Stefan; Hafid, Salim; Korre, Katerina; Muti, Arianna; Nakov, Preslav; Ruggeri, Federico; Schellhammer, Sebastian; others,
The CLEF-2025 CheckThat! Lab: Subjectivity, Fact-Checking, Claim Normalization, and Retrieval Proceedings Article
In: European Conference on Information Retrieval, pp. 467–478, Springer 2025.
@inproceedings{alam2025clef,
title = {The CLEF-2025 CheckThat! Lab: Subjectivity, Fact-Checking, Claim Normalization, and Retrieval},
author = {Firoj Alam and Julia Maria Struß and Tanmoy Chakraborty and Stefan Dietze and Salim Hafid and Katerina Korre and Arianna Muti and Preslav Nakov and Federico Ruggeri and Sebastian Schellhammer and others},
url = {https://arxiv.org/abs/2503.14828},
year = {2025},
date = {2025-01-01},
urldate = {2025-01-01},
booktitle = {European Conference on Information Retrieval},
pages = {467–478},
organization = {Springer},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Hasanain, Maram; Ahmad, Fatema; Alam, Firoj
Large Language Models for Propaganda Span Annotation Proceedings Article
In: Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung (Ed.): Findings of the Association for Computational Linguistics: EMNLP 2024, pp. 14522–14532, Association for Computational Linguistics, Miami, Florida, USA, 2024.
@inproceedings{hasanain-etal-2024-large,
title = {Large Language Models for Propaganda Span Annotation},
author = {Maram Hasanain and Fatema Ahmad and Firoj Alam},
editor = {Yaser Al-Onaizan and Mohit Bansal and Yun-Nung Chen},
url = {https://aclanthology.org/2024.findings-emnlp.850},
year = {2024},
date = {2024-11-01},
urldate = {2024-11-01},
booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2024},
pages = {14522–14532},
publisher = {Association for Computational Linguistics},
address = {Miami, Florida, USA},
abstract = {The use of propagandistic techniques in online content has increased in recent years aiming to manipulate online audiences. Fine-grained propaganda detection and extraction of textual spans where propaganda techniques are used, are essential for more informed content consumption. Automatic systems targeting the task over lower resourced languages are limited, usually obstructed by lack of large scale training datasets. Our study investigates whether Large Language Models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. The experiments are performed over a large-scale in-house manually annotated dataset. The results suggest that providing more annotation context to GPT-4 within prompts improves its performance compared to human annotators. Moreover, when serving as an expert annotator (consolidator), the model provides labels that have higher agreement with expert annotators, and lead to specialized models that achieve state-of-the-art over an unseen Arabic testing set. Finally, our work is the first to show the potential of utilizing LLMs to develop annotated datasets for propagandistic spans detection task prompting it with annotations from human annotators with limited expertise. All scripts and annotations will be shared with the community.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Alam, Firoj; Hasnat, Abul; Ahmad, Fatema; Hasan, Md. Arid; Hasanain, Maram
ÄrMeme: Propagandistic Content in Arabic Memes Proceedings Article
In: Al-Onaizan, Yaser; Bansal, Mohit; Chen, Yun-Nung (Ed.): Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp. 21071–21090, Association for Computational Linguistics, Miami, Florida, USA, 2024.
@inproceedings{alam-etal-2024-armeme,
title = {ÄrMeme: Propagandistic Content in Arabic Memes},
author = {Firoj Alam and Abul Hasnat and Fatema Ahmad and Md. Arid Hasan and Maram Hasanain},
editor = {Yaser Al-Onaizan and Mohit Bansal and Yun-Nung Chen},
url = {https://aclanthology.org/2024.emnlp-main.1173},
year = {2024},
date = {2024-11-01},
urldate = {2024-11-01},
booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
pages = {21071–21090},
publisher = {Association for Computational Linguistics},
address = {Miami, Florida, USA},
abstract = {With the rise of digital communication memes have become a significant medium for cultural and political expression that is often used to mislead audience. Identification of such misleading and persuasive multimodal content become more important among various stakeholders, including social media platforms, policymakers, and the broader society as they often cause harm to the individuals, organizations and/or society. While there has been effort to develop AI based automatic system for resource rich languages (e.g., English), it is relatively little to none for medium to low resource languages. In this study, we focused on developing an Arabic memes dataset with manual annotations of propagandistic content. We annotated $sim6K$ Arabic memes collected from various social media platforms, which is a first resource for Arabic multimodal research. We provide a comprehensive analysis aiming to develop computational tools for their detection. We made the dataset publicly available for the community.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Hasanain, Maram; Hasan, Md. Arid; Ahmad, Fatema; Suwaileh, Reem; Biswas, Md. Rafiul; Zaghouani, Wajdi; Alam, Firoj
ÄrAIEval Shared Task: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content Proceedings Article
In: Habash, Nizar; Bouamor, Houda; Eskander, Ramy; Tomeh, Nadi; Farha, Ibrahim Abu; Abdelali, Ahmed; Touileb, Samia; Hamed, Injy; Onaizan, Yaser; Alhafni, Bashar; Antoun, Wissam; Khalifa, Salam; Haddad, Hatem; Zitouni, Imed; AlKhamissi, Badr; Almatham, Rawan; Mrini, Khalil (Ed.): Proceedings of The Second Arabic Natural Language Processing Conference, pp. 456–466, Association for Computational Linguistics, Bangkok, Thailand, 2024.
@inproceedings{hasanain-etal-2024-araieval,
title = {ÄrAIEval Shared Task: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content},
author = {Maram Hasanain and Md. Arid Hasan and Fatema Ahmad and Reem Suwaileh and Md. Rafiul Biswas and Wajdi Zaghouani and Firoj Alam},
editor = {Nizar Habash and Houda Bouamor and Ramy Eskander and Nadi Tomeh and Ibrahim Abu Farha and Ahmed Abdelali and Samia Touileb and Injy Hamed and Yaser Onaizan and Bashar Alhafni and Wissam Antoun and Salam Khalifa and Hatem Haddad and Imed Zitouni and Badr AlKhamissi and Rawan Almatham and Khalil Mrini},
url = {https://aclanthology.org/2024.arabicnlp-1.44},
year = {2024},
date = {2024-08-01},
urldate = {2024-08-01},
booktitle = {Proceedings of The Second Arabic Natural Language Processing Conference},
pages = {456–466},
publisher = {Association for Computational Linguistics},
address = {Bangkok, Thailand},
abstract = {We present an overview of the second edition of the ArAIEval shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. In this edition, ArAIEval offers two tasks: (i) detection of propagandistic textual spans with persuasion techniques identification in tweets and news articles, and (ii) distinguishing between propagandistic and non-propagandistic memes. A total of 14 teams participated in the final evaluation phase, with 6 and 9 teams participating in Tasks 1 and 2, respectively. Finally, 11 teams submitted system description papers. Across both tasks, we observed that fine-tuning transformer models such as AraBERT was at the core of the majority of the participating systems. We provide a description of the task setup, including a description of the dataset construction and the evaluation setup. We further provide a brief overview of the participating systems. All datasets and evaluation scripts are released to the research community. We hope this will enable further research on these important tasks in Arabic.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Dimitrov, Dimitar; Alam, Firoj; Hasanain, Maram; Hasnat, Abul; Silvestri, Fabrizio; Nakov, Preslav; Martino, Giovanni Da San
SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes Proceedings Article
In: Ojha, Atul Kr.; Doğruöz, A. Seza; Madabushi, Harish Tayyar; Martino, Giovanni Da San; Rosenthal, Sara; Rosá, Aiala (Ed.): Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pp. 2009–2026, Association for Computational Linguistics, Mexico City, Mexico, 2024.
@inproceedings{dimitrov-etal-2024-semevalb,
title = {SemEval-2024 Task 4: Multilingual Detection of Persuasion Techniques in Memes},
author = {Dimitar Dimitrov and Firoj Alam and Maram Hasanain and Abul Hasnat and Fabrizio Silvestri and Preslav Nakov and Giovanni Da San Martino},
editor = {Atul Kr. Ojha and A. Seza Doğruöz and Harish Tayyar Madabushi and Giovanni Da San Martino and Sara Rosenthal and Aiala Rosá},
url = {https://aclanthology.org/2024.semeval-1.275},
doi = {https://doi.org/10.18653/v1/2024.semeval-1.275},
year = {2024},
date = {2024-06-01},
urldate = {2024-06-01},
booktitle = {Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)},
pages = {2009–2026},
publisher = {Association for Computational Linguistics},
address = {Mexico City, Mexico},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Hasanain, Maram; Ahmad, Fatema; Alam, Firoj
Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles Proceedings Article
In: Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (Ed.): Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 2724–2744, ELRA and ICCL, Torino, Italia, 2024.
@inproceedings{hasanain-etal-2024-gpt,
title = {Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles},
author = {Maram Hasanain and Fatema Ahmad and Firoj Alam},
editor = {Nicoletta Calzolari and Min-Yen Kan and Veronique Hoste and Alessandro Lenci and Sakriani Sakti and Nianwen Xue},
url = {https://aclanthology.org/2024.lrec-main.244},
year = {2024},
date = {2024-05-01},
urldate = {2024-05-01},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
pages = {2724–2744},
publisher = {ELRA and ICCL},
address = {Torino, Italia},
abstract = {The use of propaganda has spiked on mainstream and social media, aiming to manipulate or mislead users. While efforts to automatically detect propaganda techniques in textual, visual, or multimodal content have increased, most of them primarily focus on English content. The majority of the recent initiatives targeting medium to low-resource languages produced relatively small annotated datasets, with a skewed distribution, posing challenges for the development of sophisticated propaganda detection models. To address this challenge, we carefully develop the largest propaganda dataset to date, ArPro, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques. Furthermore, our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda detection from text. Results showed that GPT-4's performance degrades as the task moves from simply classifying a paragraph as propagandistic or not, to the fine-grained task of detecting propaganda techniques and their manifestation in text. Compared to models fine-tuned on the dataset for propaganda detection at different classification granularities, GPT-4 is still far behind. Finally, we evaluate GPT-4 on a dataset consisting of six other languages for span detection, and results suggest that the model struggles with the task across languages. We made the dataset publicly available for the community.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Educational Material
Introduction to Critical Digital Literacy
Download the booklet: Introduction to Critical Digital literacy
Critical digital literacy is essential in today’s world, where the internet and social media are the primary sources of information and communication. As mentioned before, there are many harmful online content that can sway opinions and actions. Hence, fostering critical digital literacy skills is vital in combating the spread of fake news, harmful stereotypes, and divisive narratives. Learn more about it in the attached booklet.
Propaganda
Download the booklet: Propaganda
It is important to learn what propaganda is as a part of Critical Digital Literacy to create a safer online space, where we engage with the digital world critically.
Conferences
Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content
Read the full paper Here
Find the presentation slides: Here
Find the poster Here
Abstract: We present an overview of the second edition of the ArAIEval shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. In this edition, ArAIEval offers two tasks: (i) detection of propagandistic textual spans with persuasion techniques identification in tweets and news articles, and (ii) distinguishing between propagandistic and non-propagandistic memes. A total of 14 teams participated in the final evaluation phase, with 6 and 9 teams participating in Tasks 1 and 2, respectively. Finally, 11 teams submitted system description papers. Across both tasks, we observed that fine-tuning transformer models such as AraBERT was at the core of the majority of the participating systems. We provide a description of the task setup, including a description of the dataset construction and the evaluation setup. We further provide a brief overview of the participating systems. All datasets and evaluation scripts are released to the research community. We hope this will enable further research on these important tasks in Arabic.
Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness
Read the full paper Here
Find the presentation slides: Here
Abstract: We present an overview of the CheckThat! Lab 2024 Task 1, part of CLEF 2024. Task 1 involves determining whether a text item is check-worthy, with a special emphasis on COVID-19, political news, and political debates
and speeches. It is conducted in three languages: Arabic, Dutch, and English. Additionally, Spanish was offered for extra training data during the development phase. A total of 75 teams registered, with 37 teams submitting 236 runs and 17 teams submitting system description papers. Out of these, 13, 15 and 26 teams participated for Arabic, Dutch and English, respectively. Among these teams, the use of transformer pre-trained language models (PLMs) was the most frequent. A few teams also employed Large Language Models (LLMs). We provide a description of the dataset, the task setup, including evaluation settings, and a brief overview of the participating systems. As is customary in the CheckThat! Lab, we release all the datasets as well as the evaluation scripts to the research community. This will enable further research on identifying relevant check-worthy content that can assist various stakeholders, such as fact-checkers, journalists, and policymakers.
ArMeme: Propagandistic Content in Arabic Memes
Read the full paper here
Find the presentation here
Download the dataset from here
Abstract: With the rise of digital communication memes have become a significant medium for cultural and political expression that is often used to mislead audience. Identification of such misleading and persuasive multimodal content become more important among various stakeholders, including social media platforms, policymakers, and the broader society as they often cause harm to the individuals, organizations and/or society. While there has been effort to develop AI based automatic system for resource rich languages (e.g., English), it is relatively little to none for medium to low resource languages. In this study, we focused on developing an Arabic memes dataset with manual annotations of propagandistic content. We annotated ∼6K Arabic memes collected from various social media platforms, which is a first resource for Arabic multimodal research. We provide a comprehensive analysis aiming to develop computational tools for their detection. We made the dataset publicly available for the community.
Large Language Models for Propaganda Span Annotation
Read the full paper here
Find the poster here
Download the dataset from here
Abstract: The use of propagandistic techniques in online content has increased in recent years aiming to manipulate online audiences. Fine-grained propaganda detection and extraction of textual spans where propaganda techniques are used, are essential for more informed content consumption. Automatic systems targeting the task over lower resourced languages are limited, usually obstructed by lack of large scale training datasets. Our study investigates whether Large Language Models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. The experiments are performed over a large-scale in-house manually annotated dataset. The results suggest that providing more annotation context to GPT-4 within prompts improves its performance compared to human annotators. Moreover, when serving as an expert annotator (consolidator), the model provides labels that have higher agreement with expert annotators, and lead to specialized models that achieve state-of-the-art over an unseen Arabic testing set. Finally, our work is the first to show the potential of utilizing LLMs to develop annotated datasets for propagandistic spans detection task prompting it with annotations from human annotators with limited expertise. All scripts and annotations will be shared with the community.
Can GPT-4 Identify Propaganda? Annotation and Detection of Propaganda Spans in News Articles
Read the full paper here
Find the poster here
Download the dataset from here
Abstract: The use of propaganda has spiked on mainstream and social media, aiming to manipulate or mislead users. While efforts to automatically detect propaganda techniques in textual, visual, or multimodal content have increased, most of them primarily focus on English content. The majority of the recent initiatives targeting medium to low-resource languages produced relatively small annotated datasets, with a skewed distribution, posing challenges for the development of sophisticated propaganda detection models. To address this challenge, we carefully develop the largest propaganda dataset to date, ArPro, comprised of 8K paragraphs from newspaper articles, labeled at the text span level following a taxonomy of 23 propagandistic techniques. Furthermore, our work offers the first attempt to understand the performance of large language models (LLMs), using GPT-4, for fine-grained propaganda detection from text. Results showed that GPT-4’s performance degrades as the task moves from simply classifying a paragraph as propagandistic or not, to the fine-grained task of detecting propaganda techniques and their manifestation in text. Compared to models fine-tuned on the dataset for propaganda detection at different classification granularities, GPT-4 is still far behind. Finally, we evaluate GPT-4 on a dataset consisting of six other languages for span detection, and results suggest that the model struggles with the task across languages. We made the dataset publicly available for the community.
Persuasion Techniques and Disinformation Detection in Arabic Text
Find the poster Here
Find the presentation Here
Abstract: We present an overview of CheckThat! Lab’s 2024 Task 3, which focuses on detecting 23 persuasion techniques at the text-span level in online media. The task covers five languages, namely, Arabic, Bulgarian, English, Portuguese, and Slovene, and highly-debated topics in the media, e.g., the Isreali–Palestian conflict, the Russia– Ukraine war, climate change, COVID-19, abortion, etc. A total of 23 teams registered for the task, and two of them submitted system responses which were compared against a baseline and a task organizers’ system, which used a state-of-the-art transformer-based architecture. We provide a description of the dataset and the overall task setup, including the evaluation methodology, and an overview of the participating systems. The datasets accompanied with the evaluation scripts are released to the research community, which we believe will foster research on persuasion technique detection and analysis of online media content in various fields and contexts.
Workshops
Critique What You Read!
Find the presentation here
On the 8th of Sep, 2024, Critical Digital Literacy team and team MARSAD (sp#1), and in collaboration with QNL, held a public workshop to empower people to critique what they read. The workshop focused on the ways we can improve our consumption of news and online content, and empowered them with ways and tools to verify news and identify possible use of propagandistic techniques.
Prop2Hate-Meme
Download the dataset here
We adopted the ArMeme dataset for both fine- and coarse-grained hatefulness categorization. We preserved the original train, development, and test splits, with the test set released as dev_test. While ArMeme was initially annotated with four labels, for this study we retained only the memes labeled as propaganda and not_propaganda. These were subsequently re-annotated with hatefulness categories. The data distribution is provided below.
ArMeme
Download the dataset here
Read the paper here
ArMeme is the first multimodal Arabic memes dataset that includes both text and images, collected from various social media platforms. It serves as the first resource dedicated to Arabic multimodal research. While the dataset has been annotated to identify propaganda in memes, it is versatile and can be utilized for a wide range of other research purposes, including sentiment analysis, hate speech detection, cultural studies, meme generation, and cross-lingual transfer learning. The dataset opens new avenues for exploring the intersection of language, culture, and visual communication.
LLM_Propaganda Annotation
Download the dataset here
Read the paper here
Our study investigates whether large language models (LLMs), such as GPT-4, can effectively extract propagandistic spans. We further study the potential of employing the model to collect more cost-effective annotations. Finally, we examine the effectiveness of labels provided by GPT-4 in training smaller language models for the task. In this repo we release full human annotations, consolidated gold labels, and annotations provided by GPT-4 in different annotator roles.