LoFT: Local proxy fine-tuning for improving transferability of adversarial attacks against large language model

Published in arXiv preprint arXiv:2310.04445, 2023

Large Language Models (LLMs) have demonstrated remarkable capabilities but remain vulnerable to adversarial attacks. This paper introduces LoFT (Local proxy Fine-Tuning), a method for improving the transferability of adversarial attacks against LLMs. By fine-tuning a local proxy model to better approximate the target model’s behavior, we can craft more effective transferable adversarial examples. Our approach provides insights into LLM vulnerabilities and can inform the development of more robust models.

Recommended citation: @article{shah2023loft, title={LoFT: Local Proxy Fine-Tuning for Improving Transferability of Adversarial Attacks Against Large Language Model}, author={Shah, Muhammad Awais and Sharma, Rishabh and Dhamyal, Hira and Olivier, Raphael and Shah, Ankit and Konan, Joseph and Alharthi, Dareen and Shirol, Hazim Taha and Raj, Bhiksha}, journal={arXiv preprint arXiv:2310.04445}, year={2023} }
Download Paper