Classification of Phishing Webpages using Supervised Machine Learning Algorithms

Document Type

Conference Proceeding

Source of Publication

2024 15th Annual Undergraduate Research Conference on Applied Computing (URC)

Publication Date

4-25-2024

Abstract

A phishing attack is a type of cybercrime that is done by stealing sensitive information of the victim without their authorization with an attempt to either use this secret data in illegal activities or gain financially from it through selling. It is considered a social engineering attack in terms of luring the users to let them click on illegitimate HTML websites and provide personal information, which results in losing the authentication and confidentiality aspects of information security. In this project, multiple supervised machine learning algorithms are applied to a large, labeled dataset of phishing websites that can be classified as official or phished. The classification method will be done according to four classifier algorithms: Decision tree (DT), Logistic Regression (LR), Support Vector Machines (SVM), and K-Nearest Neighbors (KNN). The accuracy will be used as a performance measure to verify which machine learning algorithm performs and accomplishes better results on the given balanced dataset. After evaluating the performance of the machine learning-based phishing attack detection algorithms, results reveal that the LR model is the best classifier, with an accuracy reaching 95% while the KNN model has the lowest over performance with an accuracy of 85%. The SVM model has the lowest performance in predicting phishing URLs making it less suitable for this specific classification task compared to other algorithms.

ISBN

979-8-3315-2734-1

Publisher

IEEE

Volume

00

First Page

1

Last Page

6

Disciplines

Computer Sciences

Keywords

Phishing, Supervised Machine Learning, Cybercrime, Information Security, Classification

Indexed in Scopus

no

Open Access

no

Share

COinS