Adversarial Attacks on Large Language Models: A Survey

Author First name, Last name, Institution

Meera Al Kuwaiti, Abu Dhabi University
Heba Ismail, Zayed University

Document Type

Conference Proceeding

Source of Publication

Lecture Notes in Networks and Systems

Publication Date

10-1-2025

Abstract

This survey provides a comprehensive examination of adversarial techniques targeting Large Language Models (LLMs), such as prompt injection, token manipulation, and jailbreak attacks, highlighting their impact on the model’s accuracy and reliability. The methodology involved a systematic collection and review of recent research across key databases, including IEEE Xplore, ACM Digital Library, and Google Scholar, yielding 15 relevant studies from an initial pool of 30 papers. Each study was analyzed for methodologies, datasets, and findings related to adversarial attacks and defense mechanisms. Our findings reveal critical vulnerabilities in current LLMs and assess the strengths and limitations of various defense strategies, such as input validation, adversarial training, and safety filters. This survey identifies significant challenges in existing defenses and proposes future research directions to enhance LLM reliability and security against evolving adversarial threats.

ISBN

[9789819692477]

ISSN

2367-3370

Publisher

Springer Nature Singapore

Volume

1539 LNNS

First Page

529

Last Page

547

Disciplines

Computer Sciences

Keywords

Adversarial attacks, Jailbreak, Large language models, Machine learning, Prompt injection

Scopus ID

105020189273

Indexed in Scopus

yes

Open Access

no

Share

COinS