Improving data efficiency via curating LLM-driven rating systems
Published in International Conference on Learning Representations (ICLR 2025), 2025
Training data quality is critical for machine learning model performance. This paper presents methods for curating training data using LLM-driven rating systems that can assess data quality and relevance. Our approach improves data efficiency by identifying and prioritizing high-quality samples, reducing the amount of data needed to achieve target performance levels while avoiding the costs of manual data annotation.
Recommended citation: @inproceedings{pang2025improving, title={Improving Data Efficiency via Curating LLM-Driven Rating Systems}, author={Pang, Jinlong and Wei, Jiaheng and Shah, Ankit Parag and Zhu, Zhaowei and Wang, Yaxuan and Qian, Chen and Liu, Yang and Bao, Yujia and Wei, Wei}, booktitle={International Conference on Learning Representations (ICLR)}, year={2025} }
Download Paper