Improving LLM Fine-tuning Through Iterative Data Curation

Researchers significantly improved the performance of large language models (LLMs) by iteratively curating their training data. Experiments involved two LLMs of varying sizes (Gemini Nano-1 and Nano-2) on tasks of different complexity, using ~100K crowdsourced annotations initially suffering from severe class imbalance (95% benign). Through iterative expert curation and model fine-tuning, performance substantially increased. The models reached approximately 40% positive examples and a Cohen's Kappa of ~0.81 (lower complexity) and ~0.78 (higher complexity), approaching expert-level performance, highlighting the crucial role of high-quality data in LLM training.
Read more