A Late Multi-modal Fusion Model for Detecting Hybrid Spam E-mail

Document Type

Article

Source of Publication

International Journal of Computer Theory and Engineering

Publication Date

5-1-2023

Abstract

In recent years, spammers are now trying to obfuscate spam filtering systems by introducing hybrid spam email combining both image and text parts, which is more destructive and complicated compared to e-mails containing text or image only to cyber security. Traditionally, Optical Character Recognition (OCR) technology is used to eliminate the image parts of spam by transforming images into text. Although OCR scanning is a very successful technique for processing text-and-image hybrid spam, it is not an effective solution for dealing with huge quantities due to the Central Processing Unit (CPU) power required and the execution time it takes to scan e-mail files. To address this problem, this paper proposes a late multi-modal fusion model for a text-and-image hybrid spam e-mail filtering system compared to the classical early fusion detection model based on the OCR method. Convolutional Neural Network (CNN) and Continuous Bag of Words were implemented to extract features from image and text parts of hybrid spam respectively, whereas generated features were fed to the sigmoid layer and machine learning based classifiers to determine the e-mail ham or spam. The obtained two classification probability values were fed to a late decision model and the concluding classification decisions were analyzed with text-only classifiers based on the OCR technique in terms of prediction accuracy as well as computational efficiency. The experimental results show that the proposed late fusion model is highly superior to the benchmark in terms of execution time whereas other performance metrics are adequate. These findings reveal the superiorities of using CNN rather than OCR to detect hybrid spam e-mails.

ISSN

1793-8201

Publisher

IACSIT Press

Volume

15

Issue

2

First Page

76

Last Page

81

Disciplines

Medicine and Health Sciences

Keywords

Convolutional neural network, cyber security, hybrid spam e-mail, late fusion, spam filtering

Scopus ID

85160815075

Indexed in Scopus

yes

Open Access

yes

Open Access Type

Gold: This publication is openly available in an open access journal/series

Share

COinS