PViT: A Hybrid Model for Deepfake Face Detection using Patch Vision Transformers and Deep Learning

Document Type

Conference Proceeding

Source of Publication

2025 12th IFIP International Conference on New Technologies Mobility and Security Ntms 2025

Publication Date

7-18-2025

Abstract

The proliferation of AI-generated deepfakes, particularly facial image forgeries, poses a significant threat to digital security by facilitating misinformation, identity theft, and privacy breaches. Traditional detection approaches, primarily based on Convolutional Neural Networks (CNNs), often exhibit limited effectiveness when confronted with highly refined or subtle manipulations, leading to compromised detection performance. To address this challenge, this study explores the application of Vision Transformers (ViTs), which leverage self-attention mechanisms to capture fine-grained inconsistencies in visual patterns. This research proposed a hybrid deepfake detection model that integrates patch-oriented ViTs with CNN architectures to improve discriminative feature extraction. Experimental evaluation on benchmark datasets demonstrates that the proposed model achieved a detection accuracy 99%, precision 99%, recall 99%, F1-Score 99% on a validation set comprising 76,161 facial images, outperforming conventional CNN-based methods. These results highlight the potential of transformer-based architectures in advancing the robustness and reliability of deepfake detection systems, thereby contributing to the protection of digital authenticity and information integrity.

ISBN

[9798331552763]

Publisher

IEEE

First Page

58

Last Page

66

Disciplines

Computer Sciences

Keywords

CNN, Deep Learning, Deepfake Detection, Generative Adversarial Networks (GANs), Image Manipulation, Patches, Vision Transformer (ViT)

Scopus ID

105012575167

Indexed in Scopus

yes

Open Access

no

Share

COinS