Adversarial Attacks on Large Language Models: A Survey
Document Type
Conference Proceeding
Source of Publication
Lecture Notes in Networks and Systems
Publication Date
10-1-2025
Abstract
This survey provides a comprehensive examination of adversarial techniques targeting Large Language Models (LLMs), such as prompt injection, token manipulation, and jailbreak attacks, highlighting their impact on the model’s accuracy and reliability. The methodology involved a systematic collection and review of recent research across key databases, including IEEE Xplore, ACM Digital Library, and Google Scholar, yielding 15 relevant studies from an initial pool of 30 papers. Each study was analyzed for methodologies, datasets, and findings related to adversarial attacks and defense mechanisms. Our findings reveal critical vulnerabilities in current LLMs and assess the strengths and limitations of various defense strategies, such as input validation, adversarial training, and safety filters. This survey identifies significant challenges in existing defenses and proposes future research directions to enhance LLM reliability and security against evolving adversarial threats.
DOI Link
ISBN
[9789819692477]
ISSN
Publisher
Springer Nature Singapore
Volume
1539 LNNS
First Page
529
Last Page
547
Disciplines
Computer Sciences
Keywords
Adversarial attacks, Jailbreak, Large language models, Machine learning, Prompt injection
Scopus ID
Recommended Citation
Al Kuwaiti, Meera and Ismail, Heba, "Adversarial Attacks on Large Language Models: A Survey" (2025). All Works. 7658.
https://zuscholars.zu.ac.ae/works/7658
Indexed in Scopus
yes
Open Access
no