Deploying the 671B Parameter DeepSeek R1 LLM Locally
This post details the experience of deploying the 671B parameter DeepSeek R1 large language model locally using Ollama. The author experimented with two quantized versions: 1.73-bit and 4-bit, requiring at least 200GB and 500GB of memory respectively. On a workstation with four RTX 4090s and 384GB of DDR5 RAM, the 1.73-bit version showed slightly faster generation speed, but the 4-bit version proved more stable and less prone to generating inappropriate content. The author recommends using the model for lighter tasks, avoiding long text generation which significantly slows down the speed. Deployment involved downloading model files, installing Ollama, creating a model file, and running the model; adjusting GPU and context window parameters might be necessary to prevent out-of-memory errors.