Advances in AI-Driven Cybersecurity: Tackling Prompt Injection Attacks through Adversarial Learning
DOI:
https://doi.org/10.3126/joeis.v4i1.81604Keywords:
AI-driven cybersecurity, prompt injection, adversarial machine learning, large language models, secure AI systemsAbstract
The rapid adoption of large language models (LLMs) such as ChatGPT, Bard, and Claude has transformed human-computer interaction across various industries. However, this advancement introduces novel cybersecurity threats—particularly prompt injection attacks—that exploit the models’ instruction-following abilities to produce malicious or unintended outputs. Existing cybersecurity frameworks lack the capacity to counter these emerging threats, demanding AI- centric defense strategies.
This study explores the role of adversarial machine learning in mitigating prompt injection vulnerabilities. We conduct a comprehensive analysis of how adversarial prompts are crafted to circumvent content filters, hijack model behavior, and extract sensitive data. Building upon recent advances in adversarial learning, we propose a robust defense framework that combines adversarial training with input sanitization techniques to detect and neutralize harmful prompts.
The framework is evaluated on leading LLM platforms using both benchmark datasets and real- world scenarios. Results show enhanced resistance to both direct and indirect prompt injection attacks, with minimal compromise to model performance and responsiveness.
By embedding adversarial robustness into the deployment lifecycle of LLMs, our work advances the development of secure and trustworthy AI systems. These findings emphasize the need for evolving AI-native security protocols aligned with the dynamic nature of generative models, ensuring safe and responsible AI deployment.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright is held by the authors.