Databricks' TAO: Outperforming Fine-tuning with Unlabeled Data

2025-03-26
Databricks' TAO: Outperforming Fine-tuning with Unlabeled Data

Databricks introduces TAO (Test-time Adaptive Optimization), a novel model tuning method requiring only unlabeled usage data. Unlike traditional fine-tuning, TAO leverages test-time compute and reinforcement learning to improve model performance based on past input examples. Surprisingly, TAO surpasses traditional fine-tuning, bringing open-source models like Llama to a quality comparable to expensive proprietary models like GPT-4. This breakthrough is available in preview for Databricks customers and will power future products.