Multimodal hate speech detection: a novel deep learning framework for multilingual text and images
Document Type
Article
Source of Publication
Peerj Computer Science
Publication Date
4-16-2025
Abstract
The rapid proliferation of social media platforms has facilitated the expression of opinions but also enabled the spread of hate speech. Detecting multimodal hate speech in low-resource multilingual contexts poses significant challenges. This study presents a deep learning framework that integrates bidirectional long short-term memory (BiLSTM) and EfficientNetB1 to classify hate speech in Urdu-English tweets, leveraging both text and image modalities. We introduce multimodal multilingual hate speech (MMHS11K), a manually annotated dataset comprising 11,000 multimodal tweets. Using an early fusion strategy, text and image features were combined for classification. Experimental results demonstrate that the BiLSTM+EfficientNetB1 model outperforms unimodal and baseline multimodal approaches, achieving an F1-score of 81.2% for Urdu tweets and 75.5% for English tweets. This research addresses critical gaps in multilingual and multimodal hate speech detection, offering a foundation for future advancements.
DOI Link
ISSN
Volume
11
Disciplines
Computer Sciences
Keywords
BiLSTM, Deep learning, EfficientNetB1, Hate speech, Image, Multilingual, Multimodal, Urdu-English
Scopus ID
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Saddozai, Furqan Khan; Badri, Sahar K.; Alghazzawi, Daniyal; Khattak, Asad; and Asghar, Muhammad Zubair, "Multimodal hate speech detection: a novel deep learning framework for multilingual text and images" (2025). All Works. 7283.
https://zuscholars.zu.ac.ae/works/7283
Indexed in Scopus
yes
Open Access
yes
Open Access Type
Gold: This publication is openly available in an open access journal/series