Alibaba's Qwen 2.5: A 1M Token Context LLM
2025-01-26
Alibaba released a major update to its open-source large language model, Qwen 2.5, boasting a staggering 1 million token context window! This is achieved through a new technique called Dual Chunk Attention. Two models are available on Hugging Face: 7B and 14B parameter versions, both requiring significant VRAM – at least 120GB for the 7B and 320GB for the 14B model. While usable for shorter tasks, Alibaba recommends using their custom vLLM framework. GGUF quantized versions are emerging, offering smaller sizes, but compatibility issues with full context lengths might exist. A blogger attempted running the GGUF version on a Mac using Ollama, encountering some challenges and promising a future update.