Improving data efficiency via curating LLM-driven rating systems

Published in arXiv preprint arXiv:2410.10877, 2024

Training data quality is critical for machine learning model performance. This paper presents methods for curating training data using LLM-driven rating systems that can assess data quality and relevance. Our approach improves data efficiency by identifying and prioritizing high-quality samples, reducing the amount of data needed to achieve target performance levels while avoiding the costs of manual data annotation.

Recommended citation: @article{pang2024improving, title={Improving Data Efficiency via Curating LLM-Driven Rating Systems}, author={Pang, Jinlong and Wei, Jiaheng and Shah, Ankit Parag and Zhu, Zhaowei and Wang, Yaxuan and Qian, Chen and Liu, Yang and Bao, Yujia and Wei, Wei}, journal={arXiv preprint arXiv:2410.10877}, year={2024} }
Download Paper