Knowledge Distillation: How Small AI Models Can Challenge the Giants

DeepSeek's R1 chatbot, released earlier this year, caused a stir by rivaling the performance of leading AI models from major companies, but at a fraction of the cost and computing power. This led to accusations that DeepSeek used knowledge distillation, a technique potentially involving unauthorized access to OpenAI's o1 model. However, knowledge distillation is a well-established AI technique, dating back to a 2015 Google paper. It involves transferring knowledge from a large 'teacher' model to a smaller 'student' model, significantly reducing costs and size with minimal performance loss. This method has become ubiquitous, powering improvements to models like BERT, and continues to show immense potential across various AI applications. The controversy highlights the power and established nature of this technique, not its novelty.