Ollama Turbo: Blazing Fast Open-Source LLMs

2025-08-06
Ollama Turbo: Blazing Fast Open-Source LLMs

Ollama Turbo is a new way to run large open-source language models using datacenter-grade hardware. Many new models are too large for typical GPUs or run too slowly. Ollama Turbo offers a solution for fast execution, compatible with Ollama's App, CLI, and API. Currently in preview, it supports gpt-oss-20b and gpt-oss-120b. It works with Ollama's CLI, API, and JavaScript/Python libraries. Importantly, Ollama doesn't log or retain any queries made in Turbo mode. All hardware is US-based. Usage limits (hourly and daily) are in place to manage capacity, with usage-based pricing coming soon.

AI