Improved Crosscoder Unveils Secrets of LLM Fine-tuning
Researchers introduce a novel method, the 'tied crosscoder,' for comparing the base and fine-tuned chat models of large language models (LLMs). Unlike traditional crosscoders, the tied crosscoder allows the same latent factors to fire at different times for the base and chat models, leading to more effective identification of novel features in the chat model. Experiments demonstrate this approach provides clearer explanations of how chat behavior emerges from base model capabilities and yields more monosemantic latents. This research offers new insights into the fine-tuning process of LLMs and guides future model improvements.
Read more