Ollama Turbo: Blazing Fast Open-Source LLMs

2025-08-06
Ollama Turbo: Blazing Fast Open-Source LLMs

Ollama Turbo is a new way to run large open-source language models using datacenter-grade hardware. Many new models are too large for typical GPUs or run too slowly. Ollama Turbo offers a solution for fast execution, compatible with Ollama's App, CLI, and API. Currently in preview, it supports gpt-oss-20b and gpt-oss-120b. It works with Ollama's CLI, API, and JavaScript/Python libraries. Importantly, Ollama doesn't log or retain any queries made in Turbo mode. All hardware is US-based. Usage limits (hourly and daily) are in place to manage capacity, with usage-based pricing coming soon.

Read more
AI

Ollama Launches Desktop App for Easier LLM Interaction

2025-07-31
Ollama Launches Desktop App for Easier LLM Interaction

Ollama has released a new desktop application for macOS and Windows, offering a more streamlined way to interact with large language models. The app supports drag-and-drop file uploads (text or PDFs), making it easier to process documents. Users can also increase context length in settings for larger files (requires more memory). Multimodal support allows sending images to compatible models like Google DeepMind's Gemma 3, and code files can be processed for understanding. A command-line interface version is also available.

Read more
Development

Ollama's New Multimodal Engine: Local Inference for Vision Models

2025-05-16
Ollama's New Multimodal Engine: Local Inference for Vision Models

Ollama has launched a new engine supporting local inference for multimodal models, starting with vision models like Llama 4 Scout and Gemma 3. Addressing limitations of the ggml library for multimodal models, the engine improves model modularity, accuracy, and memory management for reliable and efficient inference with large images and complex architectures (including Mixture-of-Experts models). This focus on accuracy and reliability lays the foundation for future support of speech, image generation, and longer contexts.

Read more

Google's Gemma: A Lightweight Multimodal Model Family

2025-03-12
Google's Gemma: A Lightweight Multimodal Model Family

Google unveiled Gemma, a lightweight family of multimodal models built on Gemini technology. Gemma 3 models process text and images, boast a 128K context window, and support over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel at question answering, summarization, and reasoning, while their compact design enables deployment on resource-constrained devices. Benchmark results demonstrate strong performance across various tasks, particularly in multilingual and multimodal capabilities.

Read more

Microsoft Releases Phi-4: A 14B Parameter Open-Source Language Model

2025-01-12
Microsoft Releases Phi-4: A 14B Parameter Open-Source Language Model

Microsoft has unveiled Phi-4, a new 14-billion parameter open-source language model. Built using a blend of synthetic data, filtered public domain websites, and academic books and Q&A datasets, Phi-4 boasts a rigorous enhancement and alignment process, ensuring accurate instruction following and robust safety. With a context length of 16k tokens, it's designed for general-purpose AI systems and applications (primarily English) needing memory/compute constrained environments, low latency, and strong reasoning and logic capabilities. Microsoft emphasizes that developers should consider the limitations of language models and mitigate for accuracy, safety, and fairness, especially in high-risk scenarios.

Read more
AI