A Comparative Study of Machine Learning Models for Classification and Detection of Cybersecurity Threat in Hacking Forum
Document Type
Conference Proceeding
Source of Publication
2024 15th Annual Undergraduate Research Conference on Applied Computing (URC)
Publication Date
4-25-2024
Abstract
This paper presents a comprehensive investigation into the efficacy of machine learning algorithms, leveraging Word2Vec, TF-IDF, and GloVe embeddings for cyber threat detection in forum discussions. The study encompasses a comprehensive methodology, including data pre-processing, model training, and evaluation using popular machine learning algorithms such as Support Vector Machines (SVM), Logistic Regression (LR), Random Forest (RF), XGBoost, LSTM, and Feedforward Neural Networks. The Word2Vec models utilize semantic relationships to create document embeddings, while TF-IDF transforms textual content into numerical features. Additionally, GloVe embeddings are employed to capture global semantic relationships in the text. The findings reveal that TF-IDF-based SVM emerges as a standout performer, attaining an accuracy of 91% and demonstrating enhanced handling of imbalanced classes. The dataset in the study comprises 1966 records, providing a substantial basis for analysis and experimentation. The findings presented in this study contribute to the ongoing discourse on effective text classification methodologies in the cybersecurity landscape.
DOI Link
ISBN
979-8-3315-2734-1
Publisher
IEEE
Volume
00
First Page
1
Last Page
6
Disciplines
Computer Sciences
Keywords
Cyber threat detection, Machine learning models, Hacking forum, TF-IDF, Word2Vec
Recommended Citation
Alketbi, Shahad; BinAmro, Maitha; Alhammadi, Aryaam; and Kaddoura, Sanaa, "A Comparative Study of Machine Learning Models for Classification and Detection of Cybersecurity Threat in Hacking Forum" (2024). All Works. 6700.
https://zuscholars.zu.ac.ae/works/6700
Indexed in Scopus
no
Open Access
no