LLM Unlearning via Loss Adjustment with Only Forget Data
Published in arXiv preprint arXiv:2410.11143, 2024
Machine unlearning enables the removal of specific information from trained models without full retraining. This paper proposes a method for unlearning in large language models (LLMs) that requires only access to the data to be forgotten, not the original training data. Our approach uses loss adjustment techniques to selectively reduce the model’s ability to generate or recall specific information while preserving performance on other tasks.
Recommended citation: @article{wang2024llm, title={LLM Unlearning via Loss Adjustment with Only Forget Data}, author={Wang, Yaxuan and Wei, Jiahao and Liu, Chenyu and Pang, Jiajun and Liu, Qinghao and Shah, Ankit Parag and Bao, Yujia and Liu, Yang and Wei, Wei}, journal={arXiv preprint arXiv:2410.11143}, year={2024} }
Download Paper