Document Type

Article

Source of Publication

Data in Brief

Publication Date

2-1-2024

Abstract

This data article provides a dataset of 132421 posts and their corresponding information collected from Twitter social media. The data has two classes, ham or spam, where ham indicates non-spam clean tweets. The main target of this dataset is to study a way to classify whether a post is a spam or not automatically. The data is in Arabic language only, which makes the data essential to the researchers in Arabic natural language processing (NLP) due to the lack of resources in this language. The data is made publicly available to allow researchers to use it as a benchmark for their research in Arabic NLP. The dataset was collected using the Twitter REST API between January 27, 2021, and March 10, 2021. An ad-hoc crawler was constructed using Python programming language to collect the data. Many scientists and researchers will benefit from this dataset in the domain of cybersecurity, NLP, data science and social networking analysis.

DOI Link

10.1016/j.dib.2023.109904

ISSN

2352-3409

Publisher

Elsevier BV

Volume

Disciplines

Computer Sciences

Keywords

Classification, Cybersecurity, Deep learning, Labelled data, Machine learning, Social network analysis, Twitter

Scopus ID

85179849371

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Kaddoura, Sanaa and Henno, Safaa, "Dataset of Arabic spam and ham tweets" (2024). All Works. 6241.
https://zuscholars.zu.ac.ae/works/6241

Indexed in Scopus

yes

Open Access

yes

Open Access Type

Hybrid: This publication is openly available in a subscription-based journal/series

Download

Included in

Computer Sciences Commons

COinS

All Works

Dataset of Arabic spam and ham tweets

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

Disciplines

Keywords

Scopus ID

Creative Commons License

Recommended Citation

Indexed in Scopus

Open Access

Open Access Type

Included in

Search

Browse

Contribute

Content Type

All Works

Dataset of Arabic spam and ham tweets

Author First name, Last name, Institution

Document Type

Source of Publication

Publication Date

Abstract

DOI Link

ISSN

Publisher

Volume

Disciplines

Keywords

Scopus ID

Creative Commons License

Recommended Citation

Indexed in Scopus

Open Access

Open Access Type

Included in

Share

Search

Browse

Contribute

Content Type