Document Type

Article

Source of Publication

Forensic Science International Digital Investigation

Publication Date

7-1-2025

Abstract

Artificial Intelligence (AI) has found multi-faceted applications in critical sectors including Digital Forensics (DF) which also require eXplainability (XAI) as a non-negotiable for its applicability, such as admissibility of expert evidence in the court of law. The state-of-the-art XAI workflows focus more on utilizing XAI tools for supervised learning. This is in contrast to the fact that unsupervised learning may be practically more relevant in DF and other sectors that largely produce complex and unlabeled data continuously, in considerable volumes. This research study explores the challenges and utility of unsupervised learning-based XAI for DF's complex datasets. A memory forensics-based case scenario is implemented to detect anomalies and cluster obfuscated malware using the Isolation Forest, Autoencoder, K-means, DBSCAN, and Gaussian Mixture Model (GMM) unsupervised algorithms on three categorical levels. The CIC MalMemAnalysis-2022 dataset's binary, and multivariate (4, 16) categories are used as a reference to perform clustering. The anomaly detection and clustering results are evaluated using accuracy, confusion matrices and Adjusted Rand Index (ARI) and explained through Shapley Additive Explanations (SHAP), using force, waterfall, scatter, summary, and bar plots' local and global explanations. We also explore how some SHAP explanations may be used for dimensionality reduction.

ISSN

2666-2825

Publisher

Elsevier BV

Volume

53

Disciplines

Computer Sciences

Keywords

Autoencoder, CIC MalMemAnalysis-2022, DBSCAN, Digital forensics, Explainable artificial intelligence, GMM, Isolation forest, K-means, Principal component analysis, SHAP, Unsupervised learning, XAI

Scopus ID

105010840593

Indexed in Scopus

yes

Open Access

yes

Open Access Type

Hybrid: This publication is openly available in a subscription-based journal/series

Share

COinS