LLM Unlearning via Loss Adjustment with Only Forget Data
Published in International Conference on Learning Representations (ICLR 2025), 2025
Machine unlearning enables the removal of specific information from trained models without full retraining. This paper proposes a method for unlearning in large language models (LLMs) that requires only access to the data to be forgotten, not the original training data. Our approach uses loss adjustment techniques to selectively reduce the model’s ability to generate or recall specific information while preserving performance on other tasks.
Recommended citation: @inproceedings{wang2025llm, title={LLM Unlearning via Loss Adjustment with Only Forget Data}, author={Wang, Yaxuan and Wei, Jiaheng and Liu, Chris Yuhao and Pang, Jinlong and Liu, Quan and Shah, Ankit and Bao, Yujia and Liu, Yang and Wei, Wei}, booktitle={International Conference on Learning Representations (ICLR)}, year={2025} }
Download Paper