Alibaba's Qwen 2.5: A 1M Token Context LLM

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

Alibaba's Qwen 2.5: A 1M Token Context LLM

2025-01-26

Alibaba released a major update to its open-source large language model, Qwen 2.5, boasting a staggering 1 million token context window! This is achieved through a new technique called Dual Chunk Attention. Two models are available on Hugging Face: 7B and 14B parameter versions, both requiring significant VRAM – at least 120GB for the 7B and 320GB for the 14B model. While usable for shorter tasks, Alibaba recommends using their custom vLLM framework. GGUF quantized versions are emerging, offering smaller sizes, but compatibility issues with full context lengths might exist. A blogger attempted running the GGUF version on a Mac using Ollama, encountering some challenges and promising a future update.

(simonwillison.net)

AI Million Token Context

Orange Intelligence: Open-Source macOS Productivity Tool Surpasses Apple's

Kubernetes Controller Development: Pitfalls and Best Practices