The Three Temples of LLM Training: Pretraining, Fine-tuning, and RLHF

In the hidden mountain sanctuary of Lexiconia, ancient Scribes undergo training in a three-part temple: The Hall of Origins, The Chamber of Instructions, and The Arena of Reinforcement. The Hall of Origins involves pretraining, where Scribes read vast amounts of text to learn language patterns. The Chamber of Instructions is where fine-tuning occurs, using curated texts to guide Scribes towards better outputs. The Arena of Reinforcement utilizes Reinforcement Learning with Human Feedback (RLHF), with human judges ranking Scribe answers, rewarding good ones and punishing bad. Elite Scribes may also be subtly modified via LoRA scrolls and Adapters, tweaking responses without retraining the entire model. This three-winged temple represents the complete process of training large language models.